tag:blogger.com,1999:blog-3488593558263059782024-03-29T03:29:24.155-04:00Notes of a ProgrammerGray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.comBlogger283125tag:blogger.com,1999:blog-348859355826305978.post-33893514682827638172024-02-21T14:44:00.000-05:002024-02-21T14:44:00.198-05:00Installing Git and Other Tools on Linux Systems without Administrative Privilege<p>Sometimes I want to install software tools, such as Git, Screen, and the others on a Linux System, however, I find outselves without administraive priviledge. The first method comes to mind is to download the source code and to compile and to set it up. This method can be sometimes challenging due to numerous dependencies may also be missing on the system.</p>
<p>Recently it comes to me that we can do this via <code>conda</code>. For instance, the following steps let me install both Git and Screen on a Linux system without administrative priviledge</p>
<ol>
<li>Download miniconda.
<pre><code>
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
</code></pre>
</li>
<li>Set up miniconda
<pre><code>
bash Miniconda3-latest-Linux-x86_64.sh
</code></pre>
</li>
<li>Initialize conda. Exit shell and get back in, and then
<pre><code>
conda init
</code></pre>
</li>
<li>Install Git via conda
<pre><code>
conda install anaconda::git
</code></pre>
</li>
<li>Install Screen via conda
<pre><code>
conda install conda-forge::screen
</code></pre>
</li>
<li>Find and install others ...</li>
</ol>
<p>Some may think this method is overkill. However, it saves me tons of time to download and compile tons of dependencies. Is our own time more valuable?
</p>
Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com1tag:blogger.com,1999:blog-348859355826305978.post-27472206665147322012024-01-14T20:24:00.001-05:002024-01-14T20:24:21.637-05:00Windows Update Failed with Error Code 0x80070643<p>A recent Windows 10 udpate resulted in error code 0x80070643. The cause was that the Windows Recovery partition was not big enough. Recreating a larger Windows Recovery partition solved the issue. The references helped me are as follows:</p>
<ol>
<li><a href="https://answers.microsoft.com/en-us/windows/forum/all/install-error-0x80070643/ca8dc95f-bc48-427b-aa6a-3ef468f61ca0" target="_blank">Install error - 0x80070643</a></li>
<li><a href="https://support.microsoft.com/en-gb/topic/kb5034441-windows-recovery-environment-update-for-windows-10-version-21h2-and-22h2-january-9-2024-62c04204-aaa5-4fee-a02a-2fdea17075a8" target="_blank">KB5034441: Windows Recovery Environment update for Windows 10, version 21H2 and 22H2: January 9, 2024</a></li>
<li><a href="https://support.microsoft.com/en-us/topic/kb5028997-instructions-to-manually-resize-your-partition-to-install-the-winre-update-400faa27-9343-461c-ada9-24c8229763bf" target="_blank">KB5028997: Instructions to manually resize your partition to install the WinRE update</a></li>
</ol>
Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-43313519754811391752023-09-20T10:07:00.003-04:002023-09-20T15:28:45.599-04:00Setting up Conda Virtual Environment for Tensorflow <p>These steps are for create a Python virtual environment for running Tensorflow on GPU. The steps work on Fedora Linux 38 and Ubuntu 22.04 LTS:</p>
<p>To install miniconda, we can do as a regular user:</p>
<div style="overflow-x: auto;">
<pre><code>
curl -s "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" | bash
</code></pre>
</div>
<p>Following that, we create a conda virtual environment for Python. </p>
<div style="overflow-x: auto;">
<pre><code>
# create conda virtual environment
conda create -n tf213 python=3.11 pip
# activate the environment in order to install packages and libraries
conda activate tf213
#
# the following are from Tensorflow pip installation guide
#
# install CUDA Toolkit
conda install -c conda-forge cudatoolkit=11.8.0
# install python packages
pip install nvidia-cudnn-cu11==8.6.0.163 tensorflow==2.13.*
#
# setting up library and tool search paths
# scripts in activate.d shall be run when the environment
# is being activated
#
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
# get CUDNN_PATH
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
# set LD_LIBRARY_PATH
echo 'export LD_LIBRARY_PATH=$CUDNN_PATH/lib:$CONDA_PREFIX/lib/:$LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
# set XLA_FLAGS (for some systems, without this, it will lead to a 'libdevice not found at ./libdevice.10.bc' error
echo 'export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
</code></pre>
</div>
<p>To test it, we can run</p>
<div style="overflow-x: auto;">
<pre><code>
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
</code></pre>
</div>
<p>Enjoy!</p>
Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-52662770987176934382023-09-18T15:46:00.002-04:002023-09-18T16:33:23.520-04:00Mounting File Systems in a Disk Image on Linux<p>
On Linux systems, we can create disk image using the <code>dd</code> command.
This post lists the steps to mount file systems, in particular, LVM volumes in an image of a whole disk,
which is often created as follows,</p>
<pre><code>
dd if=/dev/sdb of=/mnt/disk1/sdb.img bs=1M status=progress
</code></pre>
<p>Assuming the disk has multiple partitions, how do we mount the file systems on these partitions? The following are the steps,</p>
<pre><code>
# 1. mount the disk where the disk image is
# we assume the disk is /dev/sdb1, and we mount
# it on directory win
sudo mount /dev/sdb1 win
# 2. map the partitions to loopback devices
# here we assume the disk image is win/disks/disk1.img
sudo losetup -f -P win/disks/disk1.img
# 3. list the LVM volumes
sudo lvdisplay
# 4. suppose from the input of the above command,
# the volumne is shown as /dev/mylvm/lvol0,
# and we want it mounted on directory lvol0
sudo mount /dev/mylvm/lvol0 lvol0
# 5. do something we want ...
# 6. unmount the volume
sudo umount lvol0
# 7. deactivate LVM volume
# we can query, confirm the volume group by
# vgdisplay
sudo vgchange -a n mylvm
# 8. detatch the loopback device
# assuming the device is /dev/loop0
sudo losetup -d /dev/loop0
# 9. umount the disk
sudo umount win
</code></pre>
<p></p>Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-45929064821411009152023-09-17T19:04:00.003-04:002023-09-17T19:04:34.930-04:00Mounting ZFS Dataset as /home<p>The following steps work:</p>
<pre><code>
# list ZFS pools and datasets
zfs list
# Query current mount point for a ZFS dataset, e.g., mypool/mydataset
zfs get mountpoint mypool/mydataset
# Set new mountpoint to /home
zfs set mountpoint=/home mypool/mydataset
# Always verify
zfs list
zfs get mountpoint mypool/mydataset
</code></pre>Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-20263998179670018742023-09-17T18:57:00.006-04:002023-09-17T18:57:37.848-04:00Persistent Mount Bind<p>The following works:</p>
<pre><code>
/from_dir_path /to_dir_path none bind,nofail
</code></pre>
Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-67980124980804182142023-08-16T22:28:00.003-04:002023-08-16T22:28:26.149-04:00Bus Error (Core Dumped)!<p>I was training a machine learning model written in PyTorch on a Linux system. During the training, I encountered "Bus error (core dumped)." This error produces no stack trace. Eventually, I figured it out that this was resulted in the exhaustion of shared memory whose symptom is that "/dev/shm" is full. </p><p>To resolve this issue, I simply double the size of "/dev/shm", following the instruction given in this Stack Overflow post,</p><p><a href="https://stackoverflow.com/questions/58804022/how-to-resize-dev-shm" target="_blank">How to resize /dev/shm?</a> <br /></p><p>Basically, it is to edit the /etc/fstab file. If the file already has an entry for /dev/shm, we simply increase its size. If not, we add a line to the file, such as</p><p><code>none /dev/shm tmpfs defaults,size=32G 0 0</code></p><p>To bring it to effect, we remount the file system, as in,</p><p><code>sudo mount /dev/shm</code></p><code></code><p><code> </code></p>Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-57849416203810423522023-07-19T13:15:00.006-04:002023-07-19T13:15:57.345-04:00Terminal Multiplexer<p>This is just a note for the number of Terminal Multiplexer that are out there:</p><ul style="text-align: left;"><li>screen</li><li>tmux</li><li>byobu</li><li>tmuxinator</li></ul><p>Some resources that I find useful</p><ol style="text-align: left;"><li><a href="https://www.baeldung.com/linux/screen-command">https://www.baeldung.com/linux/screen-command</a></li><li><a href="https://www.redhat.com/sysadmin/introduction-tmux-linux">https://www.redhat.com/sysadmin/introduction-tmux-linux</a></li><li><a href="https://www.digitalocean.com/community/tutorials/how-to-install-and-use-byobu-for-terminal-management-on-ubuntu-16-04">https://www.digitalocean.com/community/tutorials/how-to-install-and-use-byobu-for-terminal-management-on-ubuntu-16-04</a> </li><li><a href="https://askubuntu.com/questions/136776/when-using-byobu-in-a-putty-session-cannot-create-new-windows">https://askubuntu.com/questions/136776/when-using-byobu-in-a-putty-session-cannot-create-new-windows</a></li><li><a href="https://stackoverflow.com/questions/18980222/should-do-i-use-screen-or-tmux-commands">https://stackoverflow.com/questions/18980222/should-do-i-use-screen-or-tmux-commands</a></li><li><a href="https://superuser.com/questions/236158/tmux-vs-screen">https://superuser.com/questions/236158/tmux-vs-screen</a></li><li><a href="https://superuser.com/questions/423310/byobu-vs-gnu-screen-vs-tmux-usefulness-and-transferability-of-skills">https://superuser.com/questions/423310/byobu-vs-gnu-screen-vs-tmux-usefulness-and-transferability-of-skills</a></li></ol><p> </p><p> <br /></p>Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com2tag:blogger.com,1999:blog-348859355826305978.post-1074302291542504952023-03-30T14:07:00.007-04:002023-03-30T14:08:19.946-04:00Binding Process to TCP/UDP Port Failure on Windows<p>Windows has the concept of reserved TCP/UDP ports. These ports can nonetheless be used by any other application. These can be annoying because the reserved ports
would not shown be used when we query used ports using <code>netstat</code>. For instance, if we want to bind TCP port 23806 to an application, we determine the
availability using the <code>netstat</code> command, such as </p>
<pre><code>
C:> netstat -anp tcp | find ":23806"
C:>
</code></pre>
<p>The output is blank, which means that the port is unused. However, when we attempt to bind the port to a process of our choice, we encounter an error, such as</p>
<pre><code>
bind [127.0.0.1]:23806: Permission denied
</code></pre>
<p>This is annoying. The reason is that the port somehow becomes a reserved port. To see this, we can query reserved ports, e.g., </p>
<div style="overflow-x: auto;">
<pre><code>
C:> netsh int ipv4 show excludedportrange protocol=tcp
Protocol tcp Port Exclusion Ranges
Start Port End Port
---------- --------
1155 1254
... ...
23733 23832
23833 23932
50000 50059 *
* - Administered port exclusions.
C:>
</code></pre>
</div>
<p>which shows that 23806 is now a served port. What is really annoying is that the range can be updated by Windows dynamically. There are several methods to deal with
this.</p>
<ol>
<li>Method 1. Stop and start the Windows NAT Driver service.
<pre><code>
net stop winnat
net start winnat
</code></pre>
After this, query the reserved the ports again. It is often the reserved ports are much limited when compared to before, e.g.,
<pre><code>
C:>netsh int ipv4 show excludedportrange protocol=tcp
Protocol tcp Port Exclusion Ranges
Start Port End Port
---------- --------
2869 2869
5357 5357
50000 50059 *
* - Administered port exclusions.
C:>
</code></pre>
</li>
<li>Method 2. If you don't wish to use this feature of Windows, we can disable reserved ports.
<div style="overflow-x: auto;">
<pre><code>
reg add HKLM\SYSTEM\CurrentControlSet\Services\hns\State /v EnableExcludedPortRange /d 0 /f
</code></pre>
</div>
</li>
</ol>
<p></p>
Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-9587475454310447402023-03-28T14:31:00.001-04:002023-03-28T14:31:15.436-04:00Installing and Using CUDA Tookit and cuDNN in Conda Virtual Environment of Python<p>This is straightforward.</p>
<ol>
<li>Create a conda virtual environment, e.g.,
<pre><code>
conda create -n cudacudnn python=3.9 pip
</code></pre>
</li>
<li>Activate the virtual environment, i.e,
<pre><code>
conda activate cudacudnn
</code></pre>
</li>
<li>Assume we are using Pytorch 2.0, e.g., install it via
<pre><code>
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
</code></pre>
</li>
<li>Install CUDA toolkit and cuDNN, e.g.,
<pre><code>
conda install -c conda-forge cudnn=8 cudatoolkit=11.8
</code></pre>
</li>
<li>Add the library path of the conda environment to LD_LIBRARY_PATH. There are several
approaches. Two appraoches are as follows, assuming the environment
is at <code>$HOME/.conda/envs/cudacudnn</code> and we want to run <code>foo.py</code>,
<pre><code>
virtenv_path=$HOME/.conda/envs/cudacudnn
export LD_LIBRARY_PATH=${virtenv_path}/lib:$LD_LIBRARY_PATH
python foo.py
</code></pre>
or
<pre><code>
virtenv_path=$HOME/.conda/envs/cudacudnn
LD_LIBRARY_PATH=${virtenv_path}/lib:$LD_LIBRARY_PATH python foo.py
</code></pre>
</li>
</ol>Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-61233028472914503412023-03-26T17:17:00.005-04:002023-03-26T17:17:43.966-04:00Installing GPU Driver for PyTorch and Tensorflow<p>To use GPU for PyTorch and Tensorflow, a method I grow fond of is to install GPU driver from RPM fusion, in particular,
on Debian or Fedora systems where only free packages are included in their repositories. Via this method, we only install
the driver from RPM fusion, and use Python virtual environment to bring in CUDA libraries.</p>
<ol>
<li><a href="https://rpmfusion.org/Configuration" target="_blank">Configure RPM Fusion repo</a> by following the instruction, e.g., as follows:
<div style="overflow-x: auto;">
<pre><code>
sudo dnf install https://mirrors.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm https://mirrors.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm
</code></pre>
</div>
</li>
<li><a href="https://rpmfusion.org/Howto/NVIDIA" target="_blank">Install driver</a>, e.g.,
<pre><code>
sudo dnf install akmod-nvidia
</code></pre>
</li>
<li>Add CUDA support, i.e.,
<pre><code>
sudo dnf install xorg-x11-drv-nvidia-cuda
</code></pre>
</li>
<li>Check driver by running <code>nvidia-smi</code>. If it complains about not being able to connect to the driver, reboot the system.
</li>
</ol>
<p>If we use PyTorch or Tensorflow only, there is need to install CUDA from Nvidia.</p>
<h3>Reference</h3>
<ol>
<li>https://rpmfusion.org/Configuration</li>
</ol>
Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-48695902966584709882023-02-07T09:56:00.003-05:002023-02-07T09:58:49.966-05:00Tensorflow Complains "successful NUMA node read from SysFS had negative value (-1)"<p>To test GPU support for Tensorflow, we should run the following according to the manual of Tensorflow</p>
<div style="overflow-x: auto">
<pre><code>
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
</code></pre>
</div>
<p>However, in my case, I saw an annoying message:</p>
<div style="overflow-x: auto">
<pre><code>
2023-02-07 14:40:01.345350: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
</code></pre>
</div>
<p>A Stack Overflow discussion has an <a href="https://stackoverflow.com/questions/44232898/memoryerror-in-tensorflow-and-successful-numa-node-read-from-sysfs-had-negativ" target="_blank">excellent explanation</a> about this. I have a single CPU and a single GPU installed on the system. The system is a Ubuntu 20.04 LTS. Following the advice given over there,
the following command gets rid of the message, </p>
<div style="overflow-x: auto">
<pre><code>
su -c "echo 0 | tee /sys/module/nvidia/drivers/pci:nvidia/*/numa_node"
</code></pre>
</div>
<p>That is sweet!</p>
<h4>Reference</h4>
<ol>
<li><a href="https://www.tensorflow.org/install/pip#linux_setup" target="_blank">https://www.tensorflow.org/install/pip#linux_setup</a></li>
<li><a href="https://stackoverflow.com/questions/44232898/memoryerror-in-tensorflow-and-successful-numa-node-read-from-sysfs-had-negativ" target="_blank">https://stackoverflow.com/questions/44232898/memoryerror-in-tensorflow-and-successful-numa-node-read-from-sysfs-had-negativ</a></li>
</ol>
<p></p>Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-31660416136280612562023-02-04T16:06:00.002-05:002023-02-04T16:06:06.051-05:00Checking RAM Type on Linux<p>We can use the following command to check RAM types and slots</p>
<pre><code>
sudo dmidecode --type 17
</code></pre>
<p></p>Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-87336050689653668542023-02-04T15:47:00.003-05:002023-02-04T15:47:52.514-05:00Reloading WireGuard Configuration File without Completely Restarting WireGuard Session<p>On Linux systems, under bash, we can run the following command to reload and apply a revised WireGuard configuration file without restarting and distrupting
the clients</p>
<pre><code>
wg syncconf wg0 <(wg-quick strip wg0)
</code></pre>
<p>Note that this command may not work for shells other than bash. However, we can always complete this in a three step fashion.</p>
<pre><code>
wg-quick strip wg0 > temp_wg0.conf
wg sysconf wg0 temp_wg0.conf
rm temp_wg0.conf
</code></pre>
<p></p>
Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-33011270550642210272023-02-04T14:14:00.001-05:002023-02-04T14:14:24.878-05:00Determining File System of Current Directory on Linux<p>On Linux, a simple command can reveal the file system the directory is actually located. The command is </p>
<pre><code>
df -hT . Filesystem
</code></pre>
<p></p>Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-7656035286729773792023-01-29T12:07:00.005-05:002023-01-29T12:07:47.190-05:00Ressetting Network Stack on Windows<p>Sometimes, I want to reset the network stack on Windows. I found that Intel has a <a href="https://www.intel.com/content/www/us/en/support/articles/000058982/wireless/intel-killer-wi-fi-products.html" target="_blank">good documentation</a> for it. I copy the steps below:</p>
<h4>
Resetting the network stack
</h4>
<pre><code>
ipconfig /release
ipconfig /flushdns
ipconfig /renew
netsh int ip reset
netsh winsock reset
</code></pre>
<p></p>
Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-80999426906175115922023-01-29T11:52:00.004-05:002023-01-29T12:03:00.122-05:00Quick Note on WireGuard Configuration Files<p>Assume that we set up a VPN server, and a number of clients are the peers of the server. Below are example configuration files</p>
<ol>
<li>Server Configuration
<div style="overflow-x: auto;">
<pre><code>
[Interface]
Address = 10.188.0.1/32
PrivateKey = (Private Key of the server, genreated via: wg genkey | server.private)
ListenPort = 51820
[Peer]
PublicKey = (Public key of the client, generated via: wg genkey | tee client.2.private | wg pubkey)
AllowedIPs = 10.188.0.2/32
[Peer]
PublicKey = (Public key of the client, generated via: wg genkey | tee client.3.private | wg pubkey)
AllowedIPs = 10.188.0.3/32
[Peer]
PublicKey = (Public key of the client, generated via: wg genkey | tee client.4.private | wg pubkey)
AllowedIPs = 10.188.0.4/32
[Peer]
PublicKey = (Public key of the client, generated via: wg genkey | tee client.5.private | wg pubkey)
AllowedIPs = 10.188.0.5/32
</code></pre>
</div>
<ul>
<li>
The <code>AllowedIPs</code> of the Peer section is to assign the IP address to the client.
</li>
</ul>
</li>
<li>Client Configuration
<div style="overflow-x: auto;">
<pre><code>
[Interface]
Address = 10.188.0.5/32
PrivateKey = (Private Key of the the client, e.g., the content of client.5.private)
DNS = 192.168.1.1,1.1.1.1,8.8.8.8
[Peer]
PublicKey = (Public key of the server, generated via: cat server.private | wg pubkey)
AllowedIPs = 10.188.0.1/32,10.188.0.5/32
Endpoint = Server_Public_IP_OR_Hostname:51820
</code></pre>
</div>
<ul>
<li>
The <code>AllowedIPs</code> is to control access the client has to the part of the network. My experience is that
you must give the access to the server, i.e., it must include server's IP address 10.188.0.1; otherwise, there
would be a reachability problem.
</li>
<li>
Since it is a client, we should also inclue the <code>Endpoint</code>.
</li>
<li>
Numerous examples on the Web often use <code>AllowedIPs = 0.0.0.0/0,::/0</code> as part of the client configuration.
Although a further investigation is needed to confirm it, my experience is that this can be a problematic setup
for Windows clients, in particular, both the server and the client reside in private networks with the same
network prefix, e.g., 192.168.1.0/24. Windows does not appear to set up proper routes and appears to be confused
with which private network it should reach when given an IP address like 192.168.1.1. My experience seems to be
when this happens, <code>Ping</code> on Windows would report "<code>General Failure</code>."
</li>
</ul>
</li>
</ol>
Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-67766711845049166112023-01-29T10:34:00.001-05:002023-01-29T10:34:15.059-05:00Running WireGuard Windows GUI Client as Non-administrator User<p>As indicated in <a href="https://git.zx2c4.com/wireguard-windows/about/docs/adminregistry.md" target="_blank">this document</a>, and also referenced in several places, we can run the WireGuard Windows GUI client as a non-administrator user with the functionality
limited to toggle on or off the existing VPN tunnel configuration created.</p>
<p>This generally involves two steps as an administrator on the Windows host:</p>
<ol>
<li>
Create a registration key, which is specified in the command below
<pre><code>
reg add HKLM\Software\WireGuard /v LimitedOperatorUI /t REG_DWORD /d 1 /f
</code></pre>
</li>
<li>
Add the non-administrator user we wish to be able to toggle on/off the tunnel to the <code>the Network Configuration Operators builtin group</code>. We can
do this by invoking the <code>lusrmgr</code> command.
</li>
</ol>
Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-10405689780826144322023-01-27T15:21:00.004-05:002023-01-27T15:21:36.952-05:00Mysterious bash while read var behavior understood!<p>This is note about a mysterious behavior of <code>while read var</code> of the <code>Bash</code> shell. To understand the problem, let's consider
the following problem:</p>
<p>Given a text file called <code>example.txt</code> as follows, write a <code>Bash</code> shell script called <code>join_lines.sh</code> to join the lines</p>
<pre><code>
BEGIN Line 1 Line 1
Line 1 Line 1
BEGIN Line 2 Line 2
Line 2 Line 2
Line 2 Line 2
Line 2
BEGIN Line 3 Line 3 Line 3
Line 3
Line 3
</code></pre>
<p>The output should be 3 lines, as illustrated in the example below:</p>
<pre><code>
$ ./join_lines.sh
Joined Line: BEGIN Line 1 Line 1 Line 1 Line 1
Joined Line: BEGIN Line 2 Line 2 Line 2 Line 2 ine 2 Line 2 Line 2
Joined Line: BEGIN Line 3 Line 3 Line 3 ine 3 ine 3
</code></pre>
<p>Our first implementation of <code>join_lines.sh</code> is as follows:</p>
<pre><code>
#!/bin/bash
joined=""
cat test.txt | \
while read line; do
echo ${line} | grep -E -q "^BEGIN"
if [ $? -eq 0 ]; then
if [ "${joined}" != "" ]; then
echo "Joind Line: ${joined}"
joined=""
fi
fi
joined="${joined} ${line}"
done
echo "Joind Line: ${joined}"
</code></pre>
<p>Unfortunately, the output is actually the following: </p>
<pre><code>
$ ./join_lines.sh
Joind Line: BEGIN Line 1 Line 1 Line 1 Line 1
Joind Line: BEGIN Line 2 Line 2 Line 2 Line 2 Line 2 Line 2 Line 2
Joind Line:
$
</code></pre>
<p>Why does variable <code>joined</code> lose its value? That is a mystery, isn't it? To understand this, let's revise the script to print out the process ID's of
the shell. The revised version is as follows: </p>
<pre><code>
#!/bin/bash
joined=""
cat example.txt | \
while read line; do
echo ${line} | grep -E -q "^BEGIN"
if [ $? -eq 0 ]; then
if [ "${joined}" != "" ]; then
echo "In $$ $BASHPID: Joind Line: ${joined}"
joined=""
fi
fi
joined="${joined} ${line}"
done
echo "In $$ $BASHPID: Joind Line: ${joined}"
</code></pre>
<p>If we run this revised script, we shall get something like the following:</p>
<pre><code>
$ ./join_lines.sh
In 7065 7067: Joind Line: BEGIN Line 1 Line 1 Line 1 Line 1
In 7065 7067: Joind Line: BEGIN Line 2 Line 2 Line 2 Line 2 Line 2 Line 2 Line 2
In 7065 7065: Joind Line:
$
</code></pre>
<p>By carefully examine the output, we can see that <code>$$</code> and <code>$BASHPID</code> have different values at the first two lines. So, what is the
difference between <code>$$</code> and <code>$BASHPID</code> and why are they different?</p>
<p>The <code>Bash</code> manaual page states this:</p>
<pre><code>
$ man bash
...
BASHPID
Expands to the process ID of the current bash process. This
differs from $$ under certain circumstances, such as subshells
that do not require bash to be re-initialized. Assignments to
BASHPID have no effect. If BASHPID is unset, it loses its spe‐
cial properties, even if it is subsequently reset.
...
$
</code></pre>
<p>The above experiment actually reveals that the <code>while read</code>-loop actually needs to run in a subshell. In fact, there are two
variables, both called <code>joined</code>, one lives in the parent and the other the child <code>bash</code> process.
A simple fix to the script would be to put the
<code>while read</code>-loop and the last <code>echo</code> command in a subshell, e.g., as follows:</p>
<pre><code>
#!/bin/bash
joined=""
cat example.txt | \
( \
while read line; do
echo ${line} | grep -E -q "^BEGIN"
if [ $? -eq 0 ]; then
if [ "${joined}" != "" ]; then
echo "In $$ $BASHPID: Joind Line: ${joined}"
joined=""
fi
fi
joined="${joined} ${line}"
done
echo "In $$ $BASHPID: Joind Line: ${joined}" \
)
</code></pre>
<p>Let's run this revised script. We shall get:</p>
<pre><code>
$ ./join_lines.sh
In 7119 7121: Joind Line: BEGIN Line 1 Line 1 Line 1 Line 1
In 7119 7121: Joind Line: BEGIN Line 2 Line 2 Line 2 Line 2 Line 2 Line 2 Line 2
In 7119 7121: Joind Line: BEGIN Line 3 Line 3 Line 3 Line 3 Line 3
</code></pre>
<p>The mystery is solved!</p><p></p>
Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-25962049046410065982023-01-25T20:00:00.006-05:002023-01-25T20:08:07.180-05:00Disabling Linux Boot Splash Window<p>Most Linux systems use <code>Plymouthd</code> to display the Splash scren during boot. If you are running the computer as a server and do not
log in from the console, the <code>Plymouthd</code> can sometimes bring more trouble than it is worth. For one, to display the Splash window,
<code>Plymouthd</code> needs to interact with the driver of the graphics adapter in the system, and if there is an issue here, the system will
not boot successfully. Since the server's console may not be conveniently accessed, this can be a real inconvenience.</p>
<p>To remove it on Linux systems like Fedora and Redhat, we can do the following,</p>
<div style="overflow-x: auto">
<pre><code>
sudo grubby --update-kernel=ALL --remove-args="quiet"
sudo grubby --update-kernel=ALL --remove-args="rhgb"
# directly edit /etc/default/grub and add "rd.plymouth=0 plymouth.enable=0" to GRUB_CMDLINE_LINUX
vi /etc/default/grub
sudo grub2-mkconfig -o /etc/grub2.cfg
sudo dnf remove plymouth
</code></pre>
</div>
<p></p>Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-39456501037951839552023-01-21T13:38:00.001-05:002023-01-21T13:38:09.114-05:00Verifying Cuda Installation<p>For full CUDA installation, we can verify it via the following steps</p>
<div style="overflow-x: auto;">
<pre><code>
# check driver is installed
cat /proc/driver/nvidia/version
# check the version of CUDA Kit
CUDA_PATH=/usr/local/cuda
${CUDA_PATH}/bin/nvcc --version
# run deviceQuery demo program
${CUDA_PATH}/extras/demo_suite/deviceQuery
# run bandwidhtTest demo program
${CUDA_PATH}/extras/demo_suite/bandwidthTest
# run busGrind demo program
${CUDA_PATH}/extras/demo_suite/busGrind
# run vectorAdd demo program
${CUDA_PATH}/extras/demo_suite/vectorAdd
# finally, run sample programs from Nvidia
git clone https://github.com/NVIDIA/cuda-samples
cd cuda-samples
make
</code></pre>
</div>
<p></p>Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-26575143395173971202023-01-19T14:08:00.001-05:002023-01-19T14:08:08.445-05:00Removing Pandas SettingWithCopyWarning in Python Programs<p>Pandas can issue <code>SettingWithCopyWarning</code> messages. Although the messages can be false positives,
it is more than often an indicator a bug or potential bug in our Python program. However, it is sometimes
not straightforward to remove them, not until we have addressed a few thorny cases. This is a note to
document a scenario that such a warning mesasge manifests. First, let's take look at the following Python
program:</p>
<div style="overflow-x:auto;">
<pre><code>
"""
test_copywarn.py
"""
import numpy as np
import pandas as pd
def get_subdf(df, rows):
return df.iloc[rows]
def process_row(c1, c2):
return c1+c2, c1-c2
if __name__ == '__main__':
columns = ['c{}'.format(i) for i in range(3)]
indices = ['i{}'.format(i) for i in range(8)]
df = pd.DataFrame(np.random.random((8, 3)),
columns=columns,
index=indices)
print(df)
rows = [i+2 for i in range(4)]
df2 = get_subdf(df, rows)
print(df2)
df2[['d', 'e']] = \
df2.apply(lambda row: process_row(row['c1'], row['c2']),
axis=1,
result_type='expand')
print(df2)
</code></pre>
</div>
<p>In the program, we use the<code>Pandas.DataFrame.apply()</code> function to compute new columns from existing columns.</p>
<p>For reproducibility, we document the versions Python and the two packages imported:</p>
<pre><code>
$ python --version
Python 3.9.15
$ python -c "import pandas as pd; print(pd.__version__)"
1.5.2
$ python -c "import numpy as np; print(np.__version__)"
1.23.5
$
</code></pre>
<p>Now let's run the Python program:</p>
<div style="overflow-x:auto;">
<pre><code>
$ python test_copywarn.py
c0 c1 c2
i0 0.989495 0.071666 0.767847
i1 0.728875 0.881395 0.878282
i2 0.620991 0.391125 0.758265
i3 0.344082 0.971074 0.666805
i4 0.794103 0.554744 0.687492
i5 0.037881 0.790503 0.175453
i6 0.545525 0.493586 0.859064
i7 0.797247 0.271426 0.995042
c0 c1 c2
i2 0.620991 0.391125 0.758265
i3 0.344082 0.971074 0.666805
i4 0.794103 0.554744 0.687492
i5 0.037881 0.790503 0.175453
test_copywarn.py:25: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df2[['d', 'e']] = \
test_copywarn.py:25: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df2[['d', 'e']] = \
c0 c1 c2 d e
i2 0.620991 0.391125 0.758265 1.149390 -0.367141
i3 0.344082 0.971074 0.666805 1.637879 0.304269
i4 0.794103 0.554744 0.687492 1.242236 -0.132747
i5 0.037881 0.790503 0.175453 0.965956 0.615050
$
</code></pre>
</div>
<p>Python complains about the line we compute new columns from existing columns via the <code>apply</code> function, and suggests that we should use <code>.loc[row_indexer,col_indexer]</code> instead. The result appears to be correct despite the warning mesages. However, we shall see that it can have disastrous results if we blindly follow the suggestion given here.
In the following, we replace:</p>
<div style="overflow-x:auto;">
<pre><code>
df2[['d', 'e']] = \
df2.apply(lambda row: process_row(row['c1'], row['c2']),
axis=1,
result_type='expand')
</code></pre>
</div>
<p>with</p>
<div style="overflow-x:auto;">
<pre><code>
df2.loc[:, ['d', 'e']] = \
df2.apply(lambda row: process_row(row['c1'], row['c2']),
axis=1,
result_type='expand')
</code></pre>
</div>
<p>we run it again:</p>
<div style="overflow-x:auto;">
<pre><code>
$ python test_copywarn.py
...
test_copywarn.py:25: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df2.loc[:, ['d', 'e']] = \
c0 c1 c2 d e
i2 0.182985 0.635170 0.476586 NaN NaN
i3 0.157991 0.587269 0.498907 NaN NaN
i4 0.576238 0.669497 0.622658 NaN NaN
i5 0.304192 0.539268 0.618814 NaN NaN
$
</code></pre>
</div>
<p>We observe that columns <code>d</code> and <code>e</code> now have incorrect values. Two lessons here are:</p>
<ol>
<li>If we want to add new columns to a <code>DataFrame</code>, it is wrong to use the <code>.loc</code> function
because the function is to slice the <code>DataFrame</code> and when the slice does not exist, and the result
can be incorrect. </li>
<li>The error may not be at the line the <code>SettingWithCopyWarning</code> is issued </li>
</ol>
<p>For this particular example, after a closer examination, we realize the error is resulted from the chain
assignment as follows:</p>
<div style="overflow-x:auto;">
<pre><code>
df.iloc[rows][['d', 'e']] = df.iloc[rows].apply(...)
</code></pre>
</div>
<p>because <code>df2</code> is returned from <code>get_subdf</code>. Pandas designers want to ask us, do we want to
change the original <code>DataFrame df</code>? Having understood this, we have two ways to fix this:</p>
<p>We can make a deep copy of the slice, so that it becomes a new <code>DataFrame</code>, i.e., as in below</p>
<div style="overflow-x:auto;">
<pre><code>
...
df2 = get_subdf(df, rows).copy()
...
df2[['d', 'e']] = \
df2.apply(lambda row: process_row(row['c1'], row['c2']),
axis=1,
result_type='expand')
...
</code></pre>
</div>
<p>Alternatively, if we never use the original <code>DataFrame</code>, we can rename <code>df2</code> with <code>df</code>,
which also gets rid of the warning because whether we want to change the original <code>DataFrame df</code> is irrelevant
since we would lose access to it when we do <code>df = get_subdf(df, rows)</code>, becase of this, there is no
<code>SettingWithCopyWarning</code> any more. Just to emphasize this point, the complete program with this revision
is below:</p>
<div style="overflow-x:auto;">
<pre><code>
$ cat test_copywarn.py
import numpy as np
import pandas as pd
def get_subdf(df, rows):
return df.iloc[rows]
def process_row(c1, c2):
return c1+c2, c1-c2
if __name__ == '__main__':
columns = ['c{}'.format(i) for i in range(3)]
indices = ['i{}'.format(i) for i in range(8)]
df = pd.DataFrame(np.random.random((8, 3)),
columns=columns,
index=indices)
print(df)
rows = [i+2 for i in range(4)]
df = get_subdf(df, rows).copy()
print(df)
df[['d', 'e']] = \
df.apply(lambda row: process_row(row['c1'], row['c2']),
axis=1,
result_type='expand')
print(df)
$ python test_copywarn.py
c0 c1 c2
i0 0.588995 0.706887 0.684446
i1 0.142972 0.481663 0.318174
i2 0.669792 0.869648 0.439205
i3 0.663541 0.951182 0.062734
i4 0.084048 0.089704 0.264744
i5 0.952133 0.087036 0.796757
i6 0.180122 0.819766 0.949701
i7 0.761599 0.772481 0.559961
c0 c1 c2
i2 0.669792 0.869648 0.439205
i3 0.663541 0.951182 0.062734
i4 0.084048 0.089704 0.264744
i5 0.952133 0.087036 0.796757
c0 c1 c2 d e
i2 0.669792 0.869648 0.439205 1.308853 0.430444
i3 0.663541 0.951182 0.062734 1.013916 0.888447
i4 0.084048 0.089704 0.264744 0.354449 -0.175040
i5 0.952133 0.087036 0.796757 0.883793 -0.709720
$
</code></pre>
</div>
<p>which is interesting, and is worth noting it</p>
<p></p>
Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-69243617732018272442023-01-18T15:54:00.004-05:002023-01-18T15:56:18.267-05:00More Space Needed on Root File System When installing CUDA Kit<p>Following the instruction on <a href="https://developer.nvidia.com/cuda-downloads" target="_blank">Nivdia's site</a>, I was setting up CUDA Kit on a Fedora Linux host, and encountered a problem that the installation process failed due to
not encough free space on the root file system, as indicated by the error message below</p>
<div style="overflow-x: auto;">
<pre><code>
$ sudo dnf -y install cuda
...
Running transaction check
Transaction check succeeded.
Running transaction test
The downloaded packages were saved in cache until the next successful transaction.
You can remove cached packages by executing 'dnf clean packages'.
Error: Transaction test error:
installing package cuda-nvcc-12-0-12.0.76-1.x86_64 needs 67MB more space on the / filesystem
installing package cuda-gdb-12-0-12.0.90-1.x86_64 needs 84MB more space on the / filesystem
installing package cuda-driver-devel-12-0-12.0.107-1.x86_64 needs 85MB more space on the / filesystem
installing package cuda-libraries-devel-12-0-12.0.0-1.x86_64 needs 85MB more space on the / filesystem
installing package cuda-visual-tools-12-0-12.0.0-1.x86_64 needs 85MB more space on the / filesystem
installing package cuda-documentation-12-0-12.0.76-1.x86_64 needs 85MB more space on the / filesystem
installing package cuda-demo-suite-12-0-12.0.76-1.x86_64 needs 98MB more space on the / filesystem
installing package cuda-cuxxfilt-12-0-12.0.76-1.x86_64 needs 99MB more space on the / filesystem
installing package cuda-cupti-12-0-12.0.90-1.x86_64 needs 210MB more space on the / filesystem
installing package cuda-cuobjdump-12-0-12.0.76-1.x86_64 needs 210MB more space on the / filesystem
installing package cuda-compiler-12-0-12.0.0-1.x86_64 needs 210MB more space on the / filesystem
installing package cuda-sanitizer-12-0-12.0.90-1.x86_64 needs 248MB more space on the / filesystem
installing package cuda-command-line-tools-12-0-12.0.0-1.x86_64 needs 248MB more space on the / filesystem
installing package cuda-tools-12-0-12.0.0-1.x86_64 needs 248MB more space on the / filesystem
installing package cuda-toolkit-12-0-12.0.0-1.x86_64 needs 248MB more space on the / filesystem
installing package cuda-12-0-12.0.0-1.x86_64 needs 248MB more space on the / filesystem
installing package cuda-12.0.0-1.x86_64 needs 248MB more space on the / filesystem
Error Summary
-------------
Disk Requirements:
At least 248MB more space needed on the / filesystem.
...
$
</code></pre></div>
<p>It turns out that CUDA is installed at the <code>/usr/local</code> directory, and indeed, the free space on / is low.
The solution to this problem is to mount the <code>/usr/local</code> directory to a file system that has sufficient disk
space. The following steps illustrates this solultion, provided that the file system mounted at <code>/disks/disk1</code>
has sufficient space</p>
<div style="overflow-x: auto;">
<pre><code>
sudo mkdir /disks/disk1/local
sudo rsync -azvf /usr/local/* /disks/disk1/local/
sudo rm -r/usr/local
sudo mkdir /usr/local
sudo mount --bind /disks/disk1/local /usr/local
sudo cp /etc/fstab /etc/fstab.bu
su -c "echo \
'/disks/disk1/local /usr/local none defaults,bind,nofail,x-systemd.device-timeout=2 0 0' \
>> /etc/fstab"
</code></pre>
</div>
<p></p>
Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-10452342071802037962023-01-17T12:08:00.002-05:002023-01-17T12:08:20.774-05:00Installing Missing LaTeX Packages? <p>I recently discovered that I can easily install missing LaTeX packages on Fedora Linux, that is, via</p>
<pre><code>
sudo dnf install 'tex(beamer.cls)'
sudo dnf install 'tex(hyperref.sty)'
</code></pre>
<p>Can we do the similar on Debian/Ubuntu distributions?</p>
<h4>Reference</h4>
<ol>
<li>
https://docs.fedoraproject.org/en-US/neurofedora/latex/
</li>
</ol>Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0tag:blogger.com,1999:blog-348859355826305978.post-67066744335737789522023-01-16T16:50:00.007-05:002023-01-16T16:53:39.457-05:00Creating and Starting KVM Virtual Machine: Basic Steps<p>This is just a note for docummenting the basic steps to create and start KVM virtual machines on Linux systems</p>
<ol>
<li>
Make a plan for virtual machine resources. For this, we should query host resources.
<pre><code>
# show available disk spaces
df -h
# show available memory
free -m
# CPUs
lscpu
</code></pre>
</li>
<li>
Assume we are installing an Ubuntu server system. We shall download the ISO image for the system, e.g.,
<pre><code>
wget \
https://releases.ubuntu.com/22.04.1/ubuntu-22.04.1-live-server-amd64.iso \
-O /var/lib/libvirt/images/ubuntu-22.04.1-live-server-amd64.iso
</code></pre>
</li>
<li>
Create a virtual disk for the virtual machine, e.g.,
<pre><code>
sudo truncate --size=10240M /var/lib/libvirt/images/officeservice.img
</code></pre>
</li>
<li>
Decide how we should configure the virtual machine network. First, we query
existing ones:
<pre><code>
virsh --connect qemu:///system net-list --all
</code></pre>
</li>
<li>
Now create a virtual machine and set up Ubuntu Linux on it, e.g.,
<pre><code>
sudo virt-install --name ubuntu \
--description 'Ubuntu Server LTS' \
--ram 4096 \
--vcpus 2 \
--disk path=/var/lib/libvirt/images/officeservice.img,size=10 \
--osinfo detect=on,name=ubuntu-lts-latest \
--network network=default \
--graphics vnc,listen=127.0.0.1,port=5901 \
--cdrom /var/lib/libvirt/images/ubuntu-22.04.1-live-server-amd64.iso \
--noautoconsole \
--connect qemu:///system
</code></pre>
</li>
<li>
Suppose that you connect to Linux host via <code>ssh</code> via a Windows host. We cannot directly access
the console of the virtual machine (that is at 127.0.0.1:5901 via VNC). In this case,
we tunnel to the Linux host (assume its host name is LinuxHost) from the Windows host:
<pre><code>
ssh -L 15901:localhost:5901 LinuxHost
</code></pre>
</li>
<li>
We can now access the control via a VNC Viewer at the Windows host at localhost:15901.
</li>
<li>
Once Ubuntu installation is over, we would lose the VNC connectivity. But, we can
list the virtual machine created.
<pre><code>
sudo virsh --connect qemu:///system list --all
</code></pre>
</li>
<li>
To start the virtual machine, we run
<pre><code>
sudo virsh --connect qemu:///system start ubuntu
</code></pre>
</li>
<li>
To make the virtual machine to start when we boot the host, set the
virtual machine to be <code>autostart</code>, e.g.,
<pre><code>
virsh --connect qemu:///system autostart ubuntu
</code></pre>
</li>
</ol>
<h4>References</h4>
<ol>
<li>https://docs.fedoraproject.org/en-US/quick-docs/getting-started-with-virtualization/</li>
<li>https://ubuntu.com/blog/kvm-hyphervisor</li>
<li>https://askubuntu.com/questions/160152/virt-install-says-name-is-in-use-but-virsh-list-all-is-empty-where-is-virt-i</li>
<li>https://www.cyberciti.biz/faq/rhel-centos-linux-kvm-virtualization-start-virtual-machine-guest/</li>
<li>https://www.cyberciti.biz/faq/howto-linux-delete-a-running-vm-guest-on-kvm/</li>
</ol>
Gray Chenhttp://www.blogger.com/profile/01008542219994144917noreply@blogger.com0