Sunday, January 29, 2023

Ressetting Network Stack on Windows

Sometimes, I want to reset the network stack on Windows. I found that Intel has a good documentation for it. I copy the steps below:

Resetting the network stack


ipconfig /release
ipconfig /flushdns
ipconfig /renew
netsh int ip reset
netsh winsock reset

Quick Note on WireGuard Configuration Files

Assume that we set up a VPN server, and a number of clients are the peers of the server. Below are example configuration files

  1. Server Configuration
    
    [Interface]
    Address = 10.188.0.1/32
    PrivateKey = (Private Key of the server, genreated via: wg genkey | server.private)
    ListenPort = 51820
    
    
    
    [Peer]
    PublicKey = (Public key of the client, generated via: wg genkey | tee client.2.private | wg pubkey)
    AllowedIPs = 10.188.0.2/32
    
    [Peer]
    PublicKey = (Public key of the client, generated via: wg genkey | tee client.3.private | wg pubkey)
    AllowedIPs = 10.188.0.3/32
    
    [Peer]
    PublicKey = (Public key of the client, generated via: wg genkey | tee client.4.private | wg pubkey)
    AllowedIPs = 10.188.0.4/32
    
    [Peer]
    PublicKey = (Public key of the client, generated via: wg genkey | tee client.5.private | wg pubkey)
    AllowedIPs = 10.188.0.5/32  
    
    • The AllowedIPs of the Peer section is to assign the IP address to the client.
  2. Client Configuration
    
    [Interface]
    Address = 10.188.0.5/32
    PrivateKey = (Private Key of the the client, e.g., the content of client.5.private)
    DNS = 192.168.1.1,1.1.1.1,8.8.8.8
    
    
    
    [Peer]
    PublicKey = (Public key of the server, generated via: cat server.private | wg pubkey)
    AllowedIPs = 10.188.0.1/32,10.188.0.5/32
    Endpoint = Server_Public_IP_OR_Hostname:51820
    
    
    • The AllowedIPs is to control access the client has to the part of the network. My experience is that you must give the access to the server, i.e., it must include server's IP address 10.188.0.1; otherwise, there would be a reachability problem.
    • Since it is a client, we should also inclue the Endpoint.
    • Numerous examples on the Web often use AllowedIPs = 0.0.0.0/0,::/0 as part of the client configuration. Although a further investigation is needed to confirm it, my experience is that this can be a problematic setup for Windows clients, in particular, both the server and the client reside in private networks with the same network prefix, e.g., 192.168.1.0/24. Windows does not appear to set up proper routes and appears to be confused with which private network it should reach when given an IP address like 192.168.1.1. My experience seems to be when this happens, Ping on Windows would report "General Failure."

Running WireGuard Windows GUI Client as Non-administrator User

As indicated in this document, and also referenced in several places, we can run the WireGuard Windows GUI client as a non-administrator user with the functionality limited to toggle on or off the existing VPN tunnel configuration created.

This generally involves two steps as an administrator on the Windows host:

  1. Create a registration key, which is specified in the command below
    
        reg add HKLM\Software\WireGuard /v LimitedOperatorUI /t REG_DWORD /d 1 /f
        
  2. Add the non-administrator user we wish to be able to toggle on/off the tunnel to the the Network Configuration Operators builtin group. We can do this by invoking the lusrmgr command.

Friday, January 27, 2023

Mysterious bash while read var behavior understood!

This is note about a mysterious behavior of while read var of the Bash shell. To understand the problem, let's consider the following problem:

Given a text file called example.txt as follows, write a Bash shell script called join_lines.sh to join the lines


BEGIN Line 1 Line 1
Line 1 Line 1
BEGIN Line 2 Line 2
Line 2 Line 2
Line 2 Line 2
Line 2
BEGIN Line 3 Line 3 Line 3
Line 3
Line 3

The output should be 3 lines, as illustrated in the example below:


$ ./join_lines.sh
Joined Line: BEGIN Line 1 Line 1 Line 1 Line 1
Joined Line: BEGIN Line 2 Line 2 Line 2 Line 2 ine 2 Line 2 Line 2
Joined Line: BEGIN Line 3 Line 3 Line 3 ine 3 ine 3

Our first implementation of join_lines.sh is as follows:


#!/bin/bash

joined=""
cat test.txt | \
    while read line; do
        echo ${line} | grep -E -q "^BEGIN"
        if [ $? -eq 0 ]; then
            if [ "${joined}" != "" ]; then
                echo "Joind Line: ${joined}"
                joined=""
            fi
        fi
        joined="${joined} ${line}"
    done
echo "Joind Line: ${joined}"

Unfortunately, the output is actually the following:


$ ./join_lines.sh
Joind Line:  BEGIN Line 1 Line 1 Line 1 Line 1
Joind Line:  BEGIN Line 2 Line 2 Line 2 Line 2 Line 2 Line 2 Line 2
Joind Line:
$

Why does variable joined lose its value? That is a mystery, isn't it? To understand this, let's revise the script to print out the process ID's of the shell. The revised version is as follows:


#!/bin/bash

joined=""
cat example.txt | \
    while read line; do
        echo ${line} | grep -E -q "^BEGIN"
        if [ $? -eq 0 ]; then
            if [ "${joined}" != "" ]; then
                echo "In $$ $BASHPID: Joind Line: ${joined}"
                joined=""
            fi
        fi
        joined="${joined} ${line}"
    done
echo "In $$ $BASHPID: Joind Line: ${joined}"

If we run this revised script, we shall get something like the following:


$ ./join_lines.sh
In 7065 7067: Joind Line:  BEGIN Line 1 Line 1 Line 1 Line 1
In 7065 7067: Joind Line:  BEGIN Line 2 Line 2 Line 2 Line 2 Line 2 Line 2 Line 2
In 7065 7065: Joind Line:
$

By carefully examine the output, we can see that $$ and $BASHPID have different values at the first two lines. So, what is the difference between $$ and $BASHPID and why are they different?

The Bash manaual page states this:


$ man bash
...
 BASHPID
              Expands  to  the  process  ID of the current bash process.  This
              differs from $$ under certain circumstances, such  as  subshells
              that  do  not require bash to be re-initialized.  Assignments to
              BASHPID have no effect.  If BASHPID is unset, it loses its  spe‐
              cial properties, even if it is subsequently reset.
 ...
$

The above experiment actually reveals that the while read-loop actually needs to run in a subshell. In fact, there are two variables, both called joined, one lives in the parent and the other the child bash process. A simple fix to the script would be to put the while read-loop and the last echo command in a subshell, e.g., as follows:


#!/bin/bash

joined=""
cat example.txt | \
	( \
    while read line; do
        echo ${line} | grep -E -q "^BEGIN"
        if [ $? -eq 0 ]; then
            if [ "${joined}" != "" ]; then
                echo "In $$ $BASHPID: Joind Line: ${joined}"
                joined=""
            fi
        fi
        joined="${joined} ${line}"
    done
echo "In $$ $BASHPID: Joind Line: ${joined}" \
	)

Let's run this revised script. We shall get:


$ ./join_lines.sh
In 7119 7121: Joind Line:  BEGIN Line 1 Line 1 Line 1 Line 1
In 7119 7121: Joind Line:  BEGIN Line 2 Line 2 Line 2 Line 2 Line 2 Line 2 Line 2
In 7119 7121: Joind Line:  BEGIN Line 3 Line 3 Line 3 Line 3 Line 3

The mystery is solved!

Wednesday, January 25, 2023

Disabling Linux Boot Splash Window

Most Linux systems use Plymouthd to display the Splash scren during boot. If you are running the computer as a server and do not log in from the console, the Plymouthd can sometimes bring more trouble than it is worth. For one, to display the Splash window, Plymouthd needs to interact with the driver of the graphics adapter in the system, and if there is an issue here, the system will not boot successfully. Since the server's console may not be conveniently accessed, this can be a real inconvenience.

To remove it on Linux systems like Fedora and Redhat, we can do the following,


sudo grubby --update-kernel=ALL --remove-args="quiet"
sudo grubby --update-kernel=ALL --remove-args="rhgb"
# directly edit /etc/default/grub and add "rd.plymouth=0 plymouth.enable=0" to GRUB_CMDLINE_LINUX
vi /etc/default/grub
sudo grub2-mkconfig -o /etc/grub2.cfg
sudo dnf remove plymouth

Saturday, January 21, 2023

Verifying Cuda Installation

For full CUDA installation, we can verify it via the following steps


  # check driver is installed
  cat /proc/driver/nvidia/version
  
  # check the version of CUDA Kit
  CUDA_PATH=/usr/local/cuda
  ${CUDA_PATH}/bin/nvcc --version
  
  # run deviceQuery demo program
  ${CUDA_PATH}/extras/demo_suite/deviceQuery
  
  # run bandwidhtTest demo program
  ${CUDA_PATH}/extras/demo_suite/bandwidthTest
  
  # run busGrind demo program
  ${CUDA_PATH}/extras/demo_suite/busGrind
  
  # run vectorAdd demo program
  ${CUDA_PATH}/extras/demo_suite/vectorAdd
  
  # finally, run sample programs from Nvidia
  git clone https://github.com/NVIDIA/cuda-samples
  cd cuda-samples
  make
  

Thursday, January 19, 2023

Removing Pandas SettingWithCopyWarning in Python Programs

Pandas can issue SettingWithCopyWarning messages. Although the messages can be false positives, it is more than often an indicator a bug or potential bug in our Python program. However, it is sometimes not straightforward to remove them, not until we have addressed a few thorny cases. This is a note to document a scenario that such a warning mesasge manifests. First, let's take look at the following Python program:


"""
test_copywarn.py
"""
import numpy as np
import pandas as pd


def get_subdf(df, rows):
    return df.iloc[rows]

def process_row(c1, c2):
    return c1+c2, c1-c2

if __name__ == '__main__':
    columns = ['c{}'.format(i) for i in range(3)]
    indices = ['i{}'.format(i) for i in range(8)]
    df = pd.DataFrame(np.random.random((8, 3)),
                      columns=columns,
                      index=indices)
    print(df)

    rows = [i+2 for i in range(4)]
    df2 = get_subdf(df, rows)
    print(df2)


    df2[['d', 'e']] = \
            df2.apply(lambda row: process_row(row['c1'], row['c2']),
                      axis=1,
                      result_type='expand')

    print(df2)

In the program, we use thePandas.DataFrame.apply() function to compute new columns from existing columns.

For reproducibility, we document the versions Python and the two packages imported:


$ python --version
Python 3.9.15
$ python -c "import pandas as pd; print(pd.__version__)"
1.5.2
$ python -c "import numpy as np; print(np.__version__)"
1.23.5
$

Now let's run the Python program:


$ python test_copywarn.py
          c0        c1        c2
i0  0.989495  0.071666  0.767847
i1  0.728875  0.881395  0.878282
i2  0.620991  0.391125  0.758265
i3  0.344082  0.971074  0.666805
i4  0.794103  0.554744  0.687492
i5  0.037881  0.790503  0.175453
i6  0.545525  0.493586  0.859064
i7  0.797247  0.271426  0.995042
          c0        c1        c2
i2  0.620991  0.391125  0.758265
i3  0.344082  0.971074  0.666805
i4  0.794103  0.554744  0.687492
i5  0.037881  0.790503  0.175453
test_copywarn.py:25: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2[['d', 'e']] = \
test_copywarn.py:25: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2[['d', 'e']] = \
          c0        c1        c2         d         e
i2  0.620991  0.391125  0.758265  1.149390 -0.367141
i3  0.344082  0.971074  0.666805  1.637879  0.304269
i4  0.794103  0.554744  0.687492  1.242236 -0.132747
i5  0.037881  0.790503  0.175453  0.965956  0.615050
$

Python complains about the line we compute new columns from existing columns via the apply function, and suggests that we should use .loc[row_indexer,col_indexer] instead. The result appears to be correct despite the warning mesages. However, we shall see that it can have disastrous results if we blindly follow the suggestion given here. In the following, we replace:


df2[['d', 'e']] = \
            df2.apply(lambda row: process_row(row['c1'], row['c2']),
                      axis=1,
                      result_type='expand')

with


df2.loc[:, ['d', 'e']] = \
            df2.apply(lambda row: process_row(row['c1'], row['c2']),
                      axis=1,
                      result_type='expand')

we run it again:


$ python test_copywarn.py
...
test_copywarn.py:25: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2.loc[:, ['d', 'e']] = \
          c0        c1        c2   d   e
i2  0.182985  0.635170  0.476586 NaN NaN
i3  0.157991  0.587269  0.498907 NaN NaN
i4  0.576238  0.669497  0.622658 NaN NaN
i5  0.304192  0.539268  0.618814 NaN NaN
$

We observe that columns d and e now have incorrect values. Two lessons here are:

  1. If we want to add new columns to a DataFrame, it is wrong to use the .loc function because the function is to slice the DataFrame and when the slice does not exist, and the result can be incorrect.
  2. The error may not be at the line the SettingWithCopyWarning is issued

For this particular example, after a closer examination, we realize the error is resulted from the chain assignment as follows:


	df.iloc[rows][['d', 'e']] = df.iloc[rows].apply(...)

because df2 is returned from get_subdf. Pandas designers want to ask us, do we want to change the original DataFrame df? Having understood this, we have two ways to fix this:

We can make a deep copy of the slice, so that it becomes a new DataFrame, i.e., as in below


    ...
    df2 = get_subdf(df, rows).copy()
    ...
    df2[['d', 'e']] = \
            df2.apply(lambda row: process_row(row['c1'], row['c2']),
                      axis=1,
                      result_type='expand')
    ...

Alternatively, if we never use the original DataFrame, we can rename df2 with df, which also gets rid of the warning because whether we want to change the original DataFrame df is irrelevant since we would lose access to it when we do df = get_subdf(df, rows), becase of this, there is no SettingWithCopyWarning any more. Just to emphasize this point, the complete program with this revision is below:


$ cat test_copywarn.py
import numpy as np
import pandas as pd

def get_subdf(df, rows):
    return df.iloc[rows]

def process_row(c1, c2):
    return c1+c2, c1-c2


if __name__ == '__main__':

    columns = ['c{}'.format(i) for i in range(3)]
    indices = ['i{}'.format(i) for i in range(8)]
    df = pd.DataFrame(np.random.random((8, 3)),
                      columns=columns,
                      index=indices)
    print(df)

    rows = [i+2 for i in range(4)]
    df = get_subdf(df, rows).copy()
    print(df)


    df[['d', 'e']] = \
            df.apply(lambda row: process_row(row['c1'], row['c2']),
                      axis=1,
                      result_type='expand')

    print(df)
$ python test_copywarn.py
          c0        c1        c2
i0  0.588995  0.706887  0.684446
i1  0.142972  0.481663  0.318174
i2  0.669792  0.869648  0.439205
i3  0.663541  0.951182  0.062734
i4  0.084048  0.089704  0.264744
i5  0.952133  0.087036  0.796757
i6  0.180122  0.819766  0.949701
i7  0.761599  0.772481  0.559961
          c0        c1        c2
i2  0.669792  0.869648  0.439205
i3  0.663541  0.951182  0.062734
i4  0.084048  0.089704  0.264744
i5  0.952133  0.087036  0.796757
          c0        c1        c2         d         e
i2  0.669792  0.869648  0.439205  1.308853  0.430444
i3  0.663541  0.951182  0.062734  1.013916  0.888447
i4  0.084048  0.089704  0.264744  0.354449 -0.175040
i5  0.952133  0.087036  0.796757  0.883793 -0.709720
$

which is interesting, and is worth noting it

Wednesday, January 18, 2023

More Space Needed on Root File System When installing CUDA Kit

Following the instruction on Nivdia's site, I was setting up CUDA Kit on a Fedora Linux host, and encountered a problem that the installation process failed due to not encough free space on the root file system, as indicated by the error message below


$ sudo dnf -y install cuda
...
Running transaction check
Transaction check succeeded.
Running transaction test
The downloaded packages were saved in cache until the next successful transaction.
You can remove cached packages by executing 'dnf clean packages'.
Error: Transaction test error:
  installing package cuda-nvcc-12-0-12.0.76-1.x86_64 needs 67MB more space on the / filesystem
  installing package cuda-gdb-12-0-12.0.90-1.x86_64 needs 84MB more space on the / filesystem
  installing package cuda-driver-devel-12-0-12.0.107-1.x86_64 needs 85MB more space on the / filesystem
  installing package cuda-libraries-devel-12-0-12.0.0-1.x86_64 needs 85MB more space on the / filesystem
  installing package cuda-visual-tools-12-0-12.0.0-1.x86_64 needs 85MB more space on the / filesystem
  installing package cuda-documentation-12-0-12.0.76-1.x86_64 needs 85MB more space on the / filesystem
  installing package cuda-demo-suite-12-0-12.0.76-1.x86_64 needs 98MB more space on the / filesystem
  installing package cuda-cuxxfilt-12-0-12.0.76-1.x86_64 needs 99MB more space on the / filesystem
  installing package cuda-cupti-12-0-12.0.90-1.x86_64 needs 210MB more space on the / filesystem
  installing package cuda-cuobjdump-12-0-12.0.76-1.x86_64 needs 210MB more space on the / filesystem
  installing package cuda-compiler-12-0-12.0.0-1.x86_64 needs 210MB more space on the / filesystem
  installing package cuda-sanitizer-12-0-12.0.90-1.x86_64 needs 248MB more space on the / filesystem
  installing package cuda-command-line-tools-12-0-12.0.0-1.x86_64 needs 248MB more space on the / filesystem
  installing package cuda-tools-12-0-12.0.0-1.x86_64 needs 248MB more space on the / filesystem
  installing package cuda-toolkit-12-0-12.0.0-1.x86_64 needs 248MB more space on the / filesystem
  installing package cuda-12-0-12.0.0-1.x86_64 needs 248MB more space on the / filesystem
  installing package cuda-12.0.0-1.x86_64 needs 248MB more space on the / filesystem

Error Summary
-------------
Disk Requirements:
   At least 248MB more space needed on the / filesystem.
...
$

It turns out that CUDA is installed at the /usr/local directory, and indeed, the free space on / is low. The solution to this problem is to mount the /usr/local directory to a file system that has sufficient disk space. The following steps illustrates this solultion, provided that the file system mounted at /disks/disk1 has sufficient space


sudo mkdir /disks/disk1/local
sudo rsync -azvf /usr/local/* /disks/disk1/local/
sudo rm -r/usr/local
sudo mkdir /usr/local
sudo mount --bind /disks/disk1/local /usr/local
sudo cp /etc/fstab /etc/fstab.bu
su -c "echo \
  '/disks/disk1/local /usr/local none defaults,bind,nofail,x-systemd.device-timeout=2 0 0' \
  >> /etc/fstab"

Tuesday, January 17, 2023

Installing Missing LaTeX Packages?

I recently discovered that I can easily install missing LaTeX packages on Fedora Linux, that is, via


sudo dnf install 'tex(beamer.cls)' 
sudo dnf install 'tex(hyperref.sty)' 

Can we do the similar on Debian/Ubuntu distributions?

Reference

  1. https://docs.fedoraproject.org/en-US/neurofedora/latex/

Monday, January 16, 2023

Creating and Starting KVM Virtual Machine: Basic Steps

This is just a note for docummenting the basic steps to create and start KVM virtual machines on Linux systems

  1. Make a plan for virtual machine resources. For this, we should query host resources.
    
        # show available disk spaces
        df -h
        # show available memory
        free -m
        # CPUs
        lscpu
        
  2. Assume we are installing an Ubuntu server system. We shall download the ISO image for the system, e.g.,
    
        wget \
          https://releases.ubuntu.com/22.04.1/ubuntu-22.04.1-live-server-amd64.iso \
          -O /var/lib/libvirt/images/ubuntu-22.04.1-live-server-amd64.iso
        
  3. Create a virtual disk for the virtual machine, e.g.,
    
        sudo truncate --size=10240M /var/lib/libvirt/images/officeservice.img
        
  4. Decide how we should configure the virtual machine network. First, we query existing ones:
    
        virsh --connect qemu:///system  net-list --all
        
  5. Now create a virtual machine and set up Ubuntu Linux on it, e.g.,
    
        sudo virt-install --name ubuntu \
        --description 'Ubuntu Server LTS' \
        --ram 4096 \
        --vcpus 2 \
        --disk path=/var/lib/libvirt/images/officeservice.img,size=10 \
        --osinfo detect=on,name=ubuntu-lts-latest \
        --network network=default \
        --graphics vnc,listen=127.0.0.1,port=5901 \
        --cdrom /var/lib/libvirt/images/ubuntu-22.04.1-live-server-amd64.iso  \
        --noautoconsole \
        --connect qemu:///system
        
  6. Suppose that you connect to Linux host via ssh via a Windows host. We cannot directly access the console of the virtual machine (that is at 127.0.0.1:5901 via VNC). In this case, we tunnel to the Linux host (assume its host name is LinuxHost) from the Windows host:
    
        ssh -L 15901:localhost:5901 LinuxHost
        
  7. We can now access the control via a VNC Viewer at the Windows host at localhost:15901.
  8. Once Ubuntu installation is over, we would lose the VNC connectivity. But, we can list the virtual machine created.
    
        sudo virsh --connect qemu:///system list --all
        
  9. To start the virtual machine, we run
    
        sudo virsh --connect qemu:///system  start ubuntu
        
  10. To make the virtual machine to start when we boot the host, set the virtual machine to be autostart, e.g.,
    
    	virsh --connect qemu:///system autostart ubuntu
    	

References

  1. https://docs.fedoraproject.org/en-US/quick-docs/getting-started-with-virtualization/
  2. https://ubuntu.com/blog/kvm-hyphervisor
  3. https://askubuntu.com/questions/160152/virt-install-says-name-is-in-use-but-virsh-list-all-is-empty-where-is-virt-i
  4. https://www.cyberciti.biz/faq/rhel-centos-linux-kvm-virtualization-start-virtual-machine-guest/
  5. https://www.cyberciti.biz/faq/howto-linux-delete-a-running-vm-guest-on-kvm/

Listing Physical Disks behind Hardward RAID Controller on Linux

Without rebooting into BIOS and hardware RAID controller's firmware, can we figure out the disks controlled by the controller? The answer is generally yes. However, the method can vary from one RAID controller to another. To list the physical disks on Linux, we need to figure out the RADI controller model, such as,


lspci | grep RAID

In my case, I have a MegaRAID, a popular RAID controller. To figure out the disks connected to the RAID controller, we can use smartctl as follows,


sudo smartctl -i -d megaraid,0 /dev/sdb

where "metaraid" is the controller model, and "0" is the 0-th disk, and "/dev/sdb" is the Linux device for the disk array

Having understood this, we can list all disks by a script as follows:


#!/bin/bash
device=/dev/sdb
disk=0
while [ 1 ]; do
   sudo smartctl -i -d megaraid,${disk} ${device}
   if [ $? -ne 0 ]; then
     break
   fi
   let disk=${disk}+1
done

Listing Bind Mounts on Linux Systems

To list bind mounts, we can use the findmnt command. For bind mounts, findmnt prints out the directories mounted in a pair of square brackets. Then, we can use this the following:


findmnt | grep -E "\[.*\]"

Thursday, January 12, 2023

Python failed to load a pickle: __randomstate_ctor() takes from 0 to 1 positional arguments but 2 were given

When I tried to load a Python pickle created at another host (host B), I encountered an error as follows:


__randomstate_ctor() takes from 0 to 1 positional arguments but 2 were given

It turns out that I had different versions of numpy at hosts A and B. To fix it, I went to host B where the pickle was created, figured out the version of numpy


$ pip list --format=freeze | grep numpy
numpy==1.24.1

At host A, I installed numpy


 pip install numpy==1.24.1

The problem went away!