Wednesday, February 21, 2024

Installing Git and Other Tools on Linux Systems without Administrative Privilege

Sometimes I want to install software tools, such as Git, Screen, and the others on a Linux System, however, I find outselves without administraive priviledge. The first method comes to mind is to download the source code and to compile and to set it up. This method can be sometimes challenging due to numerous dependencies may also be missing on the system.

Recently it comes to me that we can do this via conda. For instance, the following steps let me install both Git and Screen on a Linux system without administrative priviledge

  1. Download miniconda.
    
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
        
  2. Set up miniconda
    
        bash Miniconda3-latest-Linux-x86_64.sh
        
  3. Initialize conda. Exit shell and get back in, and then
    
        conda init
        
  4. Install Git via conda
    
        conda install anaconda::git
        
  5. Install Screen via conda
    
        conda install conda-forge::screen
        
  6. Find and install others ...

Some may think this method is overkill. However, it saves me tons of time to download and compile tons of dependencies. Is our own time more valuable?

Sunday, January 14, 2024

Wednesday, September 20, 2023

Setting up Conda Virtual Environment for Tensorflow

These steps are for create a Python virtual environment for running Tensorflow on GPU. The steps work on Fedora Linux 38 and Ubuntu 22.04 LTS:

To install miniconda, we can do as a regular user:


curl -s "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" | bash

Following that, we create a conda virtual environment for Python.


# create conda virtual environment
conda create -n tf213 python=3.11 pip

# activate the environment in order to install packages and libraries
conda activate tf213

#
# the following are from Tensorflow pip installation guide
#
# install CUDA Toolkit 
conda install -c conda-forge cudatoolkit=11.8.0

# install python packages
pip install nvidia-cudnn-cu11==8.6.0.163 tensorflow==2.13.*

#
# setting up library and tool search paths
# scripts in activate.d shall be run when the environment
# is being activated
#
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
# get CUDNN_PATH
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
# set LD_LIBRARY_PATH
echo 'export LD_LIBRARY_PATH=$CUDNN_PATH/lib:$CONDA_PREFIX/lib/:$LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
# set XLA_FLAGS (for some systems, without this, it will lead to a 'libdevice not found at ./libdevice.10.bc' error
echo 'export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

To test it, we can run


source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Enjoy!

Monday, September 18, 2023

Mounting File Systems in a Disk Image on Linux

On Linux systems, we can create disk image using the dd command. This post lists the steps to mount file systems, in particular, LVM volumes in an image of a whole disk, which is often created as follows,


dd if=/dev/sdb of=/mnt/disk1/sdb.img bs=1M status=progress

Assuming the disk has multiple partitions, how do we mount the file systems on these partitions? The following are the steps,


# 1. mount the disk where the disk image is
#    we assume the disk is /dev/sdb1, and we mount
#    it on directory win
sudo mount /dev/sdb1 win

# 2. map the partitions to loopback devices
#    here we assume the disk image is win/disks/disk1.img
sudo losetup -f -P win/disks/disk1.img

# 3. list the LVM volumes
sudo lvdisplay

# 4. suppose from the input of the above command, 
#    the volumne is shown as /dev/mylvm/lvol0,
#    and we want it mounted on directory lvol0
sudo mount /dev/mylvm/lvol0 lvol0

# 5. do something we want ...


# 6. unmount the volume
sudo umount lvol0

# 7. deactivate LVM volume
#    we can query, confirm the volume group by
#    vgdisplay
sudo vgchange -a n mylvm

# 8. detatch the loopback device
#    assuming the device is /dev/loop0
sudo losetup -d /dev/loop0

# 9. umount the disk
sudo umount win

Sunday, September 17, 2023

Mounting ZFS Dataset as /home

The following steps work:


# list ZFS pools and datasets
zfs list

# Query current mount point for a ZFS dataset, e.g., mypool/mydataset
zfs get mountpoint mypool/mydataset

# Set new mountpoint to /home
zfs set mountpoint=/home mypool/mydataset

# Always verify
zfs list
zfs get mountpoint mypool/mydataset

Persistent Mount Bind

The following works:


/from_dir_path   /to_dir_path  none    bind,nofail

Wednesday, August 16, 2023

Bus Error (Core Dumped)!

I was training a machine learning model written in PyTorch on a Linux system. During the training, I encountered "Bus error (core dumped)." This error produces no stack trace. Eventually, I figured it out that this was resulted in the exhaustion of shared memory whose symptom is that  "/dev/shm" is full. 

To resolve this issue, I simply double the size of "/dev/shm", following the instruction given in this Stack Overflow post,

How to resize /dev/shm?

Basically, it is to edit the /etc/fstab file. If the file already has an entry for /dev/shm, we simply increase its size. If not, we add a line to the file, such as

none /dev/shm tmpfs defaults,size=32G 0 0

To bring it to effect, we remount the file system, as in,

sudo mount /dev/shm