Notes of a Programmer

Sunday, January 14, 2024

Windows Update Failed with Error Code 0x80070643

A recent Windows 10 udpate resulted in error code 0x80070643. The cause was that the Windows Recovery partition was not big enough. Recreating a larger Windows Recovery partition solved the issue. The references helped me are as follows:

Wednesday, September 20, 2023

Setting up Conda Virtual Environment for Tensorflow

These steps are for create a Python virtual environment for running Tensorflow on GPU. The steps work on Fedora Linux 38 and Ubuntu 22.04 LTS:

To install miniconda, we can do as a regular user:


curl -s "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" | bash

Following that, we create a conda virtual environment for Python.


# create conda virtual environment
conda create -n tf213 python=3.11 pip

# activate the environment in order to install packages and libraries
conda activate tf213

#
# the following are from Tensorflow pip installation guide
#
# install CUDA Toolkit 
conda install -c conda-forge cudatoolkit=11.8.0

# install python packages
pip install nvidia-cudnn-cu11==8.6.0.163 tensorflow==2.13.*

#
# setting up library and tool search paths
# scripts in activate.d shall be run when the environment
# is being activated
#
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
# get CUDNN_PATH
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
# set LD_LIBRARY_PATH
echo 'export LD_LIBRARY_PATH=$CUDNN_PATH/lib:$CONDA_PREFIX/lib/:$LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
# set XLA_FLAGS (for some systems, without this, it will lead to a 'libdevice not found at ./libdevice.10.bc' error
echo 'export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

To test it, we can run


source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Enjoy!

Monday, September 18, 2023

Mounting File Systems in a Disk Image on Linux

On Linux systems, we can create disk image using the dd command. This post lists the steps to mount file systems, in particular, LVM volumes in an image of a whole disk, which is often created as follows,


dd if=/dev/sdb of=/mnt/disk1/sdb.img bs=1M status=progress

Assuming the disk has multiple partitions, how do we mount the file systems on these partitions? The following are the steps,


# 1. mount the disk where the disk image is
#    we assume the disk is /dev/sdb1, and we mount
#    it on directory win
sudo mount /dev/sdb1 win

# 2. map the partitions to loopback devices
#    here we assume the disk image is win/disks/disk1.img
sudo losetup -f -P win/disks/disk1.img

# 3. list the LVM volumes
sudo lvdisplay

# 4. suppose from the input of the above command, 
#    the volumne is shown as /dev/mylvm/lvol0,
#    and we want it mounted on directory lvol0
sudo mount /dev/mylvm/lvol0 lvol0

# 5. do something we want ...


# 6. unmount the volume
sudo umount lvol0

# 7. deactivate LVM volume
#    we can query, confirm the volume group by
#    vgdisplay
sudo vgchange -a n mylvm

# 8. detatch the loopback device
#    assuming the device is /dev/loop0
sudo losetup -d /dev/loop0

# 9. umount the disk
sudo umount win

Sunday, September 17, 2023

Mounting ZFS Dataset as /home

The following steps work:


# list ZFS pools and datasets
zfs list

# Query current mount point for a ZFS dataset, e.g., mypool/mydataset
zfs get mountpoint mypool/mydataset

# Set new mountpoint to /home
zfs set mountpoint=/home mypool/mydataset

# Always verify
zfs list
zfs get mountpoint mypool/mydataset

Persistent Mount Bind

The following works:


/from_dir_path   /to_dir_path  none    bind,nofail

Wednesday, August 16, 2023

Bus Error (Core Dumped)!

I was training a machine learning model written in PyTorch on a Linux system. During the training, I encountered "Bus error (core dumped)." This error produces no stack trace. Eventually, I figured it out that this was resulted in the exhaustion of shared memory whose symptom is that "/dev/shm" is full.

To resolve this issue, I simply double the size of "/dev/shm", following the instruction given in this Stack Overflow post,

How to resize /dev/shm?

Basically, it is to edit the /etc/fstab file. If the file already has an entry for /dev/shm, we simply increase its size. If not, we add a line to the file, such as

none /dev/shm tmpfs defaults,size=32G 0 0

To bring it to effect, we remount the file system, as in,

sudo mount /dev/shm

Wednesday, July 19, 2023

Terminal Multiplexer

This is just a note for the number of Terminal Multiplexer that are out there:

screen
tmux
byobu
tmuxinator

Some resources that I find useful