- It lists the error codes and their meaning,
- It provides common steps to resolve the errors.
- When all failed, it provides manual steps, such as examining CBS log.
Sunday, December 14, 2025
Dealing with Windows Update Errors
Wednesday, September 24, 2025
Updating TexLive's CTAN Repository
I encountered an error when attempting to install a LaTeX package using tlgmr
tlmgr intall Package_Name
A solution to address is to change the default repository or declare one on the command line
Declaring the repository on the command line can be cubsome. We can reset the default CTAN repository using command
tlmgr option repository https://mirror.ctan.org/systems/texlive/tlnet
The above command will select one and set it as the default one. However, if you wish to specify which one to use, we can use command
tlmgr option repository https://mirrors.ibiblio.org/pub/mirrors/CTAN/systems/texlive/tlnet
The mirrors can be looked up from the CTAN mirror page.
Monday, April 21, 2025
Monitoring transient network traffic session
Sometimes there is a need to investigate network traffic that is transient. To make the problem clearer, let's examine this example. The firewall indicates some network traffic was blocked:
Block IPv4 link-local (1000000102) 192.168.99.99:35018 169.254.169.254:80 TCP:S
We want to figure out which process that sent out the packets. So, we would do something like
sudo netstat -anp | grep 35018
Unfortunately, this yields nothing because at the time we issue the netstat command port 35018 is not open. It turns out the network traffic is short-lived. How do we figure out which process sends out the packets? Of course, we could try to capture the packets:
tcpdump -XX -i any host 169.254.169.254 and port 80
which indeed captures the packets, and also shows the header and content of the packets captured. Sometimes, the packet header and the content are sufficiently for us to figure out what progress sent out the packets. However, what if the packet header and the content do not offer a clue?
It turns out, we can use sysdig, for instance, we can use it in this way:
sysdig -p '*%evt.num %evt.time %evt.cpu %proc.name (%thread.tid %proc.ppid) %evt.dir %evt.type %evt.info' fd.rip=169.254.169.254 and fd.rport=80
which tells us the process that sent out the packets and the parent process PID. The process that sent out the packets may have gone, but it is offen that the parent process is still around. This solves us the problem because it offers a way to investigate further.
Tuesday, March 11, 2025
Installing TexLive Packages Using tlmgr from a Non-default Repository
From time to time, the default TexLive repository does not work for me when I try to install a package using tlmgr. One method to get around this is to use a non-default repository, e.g., to install the listings package, we can
tlmgr -repository https://mirrors.ibiblio.org/pub/mirrors/CTAN/systems/texlive/tlnet install listings
Perhaps, the trick part is not to find a mirror, rather it is to write the correct URL. This example serves as a template for that
Monday, March 3, 2025
Selecting CUDA Devices
I observed that when I run a Pytorch program on a system with GPUs, the Pytorch runner dispatches the computational tasks to both GPUs. Since the program is not optimized for using multiple GPUs, the performance using the two GPU is worse than just using one. A simple method to address this turns out to be that we inform Pytorch to use a designated GPU via environmental variable CUDA_VISIBLE_DEVICES.
For instance, to run a task run_task.sh, we can
CUDA_VISIBLE_DEVICES=0 ./run_task.sh SEED=1234
which results in running the task on a single GPU.
For the non-optimized program, I got much better computational efficiency by doing than letting each run on two GPUs:
CUDA_VISIBLE_DEVICES=0 ./run_task.sh SEED=1234
CUDA_VISIBLE_DEVICES=1 ./run_task.sh SEED=4321
Friday, February 21, 2025
Enabling NAT and IP Masquerading on Rocky Linux 9
This is a note about enabling NAT (SNAT, more precisely) and IP masquerading on a Linux host that runs Rocky Linux 9. The host has two network interfaces: eth0 and wg0. Interface eth0 connects to the outside network and is assigned an public IP address while interface wg0 is on a private network. The objective is to make the Linux host as router for the private network so that the traffic originated from the private network can go to the outside network. The steps to achieve this objective using firewalld are as follows:
- Enable IPv4 forwarding
echo "net.ipv4.ip_forward = 1" | sudo tee -a /etc/sysctl.conf sudo sysctl -p - Assign interface eth0 to the external zone
firewall-cmd --permanent --zone=external --change-interface=eth0 - Assign interface wg0 to the internal zone
firewall-cmd --permanent --zone=internal --change-interface=wg0 -
Set the zone target of the internal zone to ACCEPT
firewall-cmd --permanent --zone=internal --set-target=ACCEPT -
Finally, reload firewalld's configuration.
firewall-cmd --reload
There is no need to meddle with anything else, such as adding nftables rules and set masquerading for the outward facing network interface. This is because the external zone is by default with masqerading enabled. This can be verified by
firewall-cmd --zone=external --query-masquerade
or by looking at the zone definition file at /usr/lib/firewalld/zones/external.xml.
In addition, the external zone's is also enabled to forward packets. We can examine this by looking at the zone definition file at /usr/lib/firewalld/zones/external.xml or by
firewall-cmd --zone=external --query-forward
The issue seems to lie at the zones' targets. First, let's view the zones' configuaration::
firewall-cmd --zone=external --list-all
Of course, we can also just check the target:
firewall-cmd --permanent --zone=external --get-target
firewall-cmd --zone=internal --list-all
Of course, we can also just check the target:
firewall-cmd --permanent --zone=internal --get-target
The targets of the both external and internal zones are both originally default. The internal zone's default target is in fact interpreted as reject, thus, preventing from packet forwarding to the outside network. This is explained as
For a forwarded packet that ingresses zoneA and egresses zoneB:
- if zoneA's target is ACCEPT, DROP, or REJECT then the packet is accepted, dropped, or rejected respectively.
- if zoneA's target is default, then the packet is accepted, dropped, or rejected based on zoneB's target. If zoneB's target is also default, then the packet will be rejected by firewalld's catchall reject.
Since both ingress (internal) and egress (external) are both "default", the result is that the internal zone's target becomes REJECT".
One question, I have in mind is, why do I not assign the internal facing interface to the trusted zone? That might be for another day.
Reference
This note benefited tremendously from the following resources:
- https://askubuntu.com/questions/1463093/what-is-target-default-of-a-zones-configuration-in-firewalld
- https://github.com/firewalld/firewalld/issues/590#issuecomment-605200548
- man firewall-cmd
- man firewalld.zone
- man firewalld
Wednesday, February 19, 2025
Runing dnf package manager on Linux with small memory
Running dnf package manager can sometimes be difficult on Linux hosts with small memory. I observed on a Rocky Linux 9 with 1 GB RAM after enabled epel, and dnf install would sometimes be killed due to OOM.
To address this issue, we can create and enable a swap space:
$ sudo dd if=/dev/zero of=/swapfile count=1024 bs=1MiB
$ sudo chmod 600 /swapfile
$ sudo mkswap /swapfile
$ sudo swapon /swapfile
$ sudo dnf updateOnce done, we then turn off the swap space:
$ sudo swapoff /swapfileReference
This idea come from this Stackoverflow post.
Thursday, December 12, 2024
Solution for problem: rootless Docker container cannot ping outside networks
I am running a rootless docker container on a Ubuntu host (24.04 LTS). However, I cannot ping the host where the container is running and the outside network. The workaround I created are two steps:
- Run the container with the
--privilegedoption, as indocker container run --privileged - On the host where the container is running, set Linux kernel parameber `net.ipv4.ping_group_range` to include the group id that runs the container. For instance,
if the group id of the user that runs the container is 3000, we can set the parameter as follows:
echo "3000 3000" > /proc/sys/net/ipv4/ping_group_range
If tests indicate that pings are successful in the container, we can set the kernel parameter through a configuration file so that the setting can survive reboot, e.g.,
- On the host that the container is running, create a file, e.g.,
/etc/sysctl.d/99-ping-group-range.confas in:echo "net.ipv4.ping_group_range=3000 3000" \ > /etc/sysctl.d/99-ping-group-range.conf
The idea of these is from
Wednesday, October 2, 2024
SSH Publication Key Authentication Fails When Home is on NFS
As the title stated, regardless how I try, I couldn't get SSH publication key authentication to work for a Linux host. It turns out that the Linux host that runs the SSH server has SELinux enabled. To make public key authnentication work for SSH, we simply need to configure SELinux, i.e.,
sudo setsebool -P use_nfs_home_dirs 1
Thursday, May 16, 2024
Signing Git Commits: Inappropriate ioctl for device
When I attempted to sign a Git commit, I encountered the following error:
$ git commit -S -m "important change"
error: gpg failed to sign the data:
[GNUPG:] KEY_CONSIDERED AAAABBBBCCCCDDDD00000 2
[GNUPG:] BEGIN_SIGNING H8
[GNUPG:] PINENTRY_LAUNCHED 583247 curses 1.2.1 - xterm localhost:10.0 - 1000/2001 0
gpg: signing failed: Inappropriate ioctl for device
[GNUPG:] FAILURE sign 83918950
gpg: signing failed: Inappropriate ioctl for device
fatal: failed to write commit object
After an investigation, I learned that the problem could be somethign with gpg itself. To gauge whether it is the problem with gpg, we could sign a message with gpg, e.g.,
$ echo "test" | gpg --clearsign
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
test
gpg: signing failed: Inappropriate ioctl for device
gpg: [stdin]: clear-sign failed: Inappropriate ioctl for device
Clearly, we observe the same error message. We can conclude it is the issue with gpg. It turns out that one reason of this is that tty is not set properly. To fix this, we set tty for gpg using
export GPG_TTY=$(tty)
This can be added to shell script's profiles
Friday, May 3, 2024
Cannot start cmd.exe on Windows 10
Somehow I encountered the problem that I could not start the Windows Command Prompt (cmd.exe).
The solution turns out is to remove a key from the registry. A number of posts points to the removal of
HKCU\Software\Microsoft\Command Processor\AutoRun.
A complexity comes from the factor that the user account
is a standard user account; howeer, regedit needs to run as an administrator, which means
the HKCU is the administrator, not the standard user.
To address this issue, we can perform the following steps
- Figure out the user's
sid:
Thewhoami /usersidbegins withS-that we can easily recognize from the output. - Open
regedit, and browse toHKEY_USERS, to the user according to the user'ssid, toSoftware, toMicrosoft, toCommand Processor, and then locateAutoRun, and remove it.
A StackOverflow post indicates several more keys to remove, but it is not necessary in my case, but it is good to document it, just in case in the future
reg delete "HKCU\Console" /f
reg delete "HKCU\Software\Microsoft\Command Processor" /v "AutoRun" /f
reg delete "HKLM\Software\Microsoft\Command Processor" /v "AutoRun" /f
reg delete "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File
Execution Options\cmd.exe" /f
Wednesday, February 21, 2024
Installing Git and Other Tools on Linux Systems without Administrative Privilege
Sometimes I want to install software tools, such as Git, Screen, and the others on a Linux System, however, I find outselves without administraive priviledge. The first method comes to mind is to download the source code and to compile and to set it up. This method can be sometimes challenging due to numerous dependencies may also be missing on the system.
Recently it comes to me that we can do this via conda. For instance, the following steps let me install both Git and Screen on a Linux system without administrative priviledge
- Download miniconda.
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh - Set up miniconda
bash Miniconda3-latest-Linux-x86_64.sh - Initialize conda. Exit shell and get back in, and then
conda init - Install Git via conda
conda install anaconda::git - Install Screen via conda
conda install conda-forge::screen - Find and install others ...
Some may think this method is overkill. However, it saves me tons of time to download and compile tons of dependencies. Is our own time more valuable?
Sunday, January 14, 2024
Windows Update Failed with Error Code 0x80070643
A recent Windows 10 udpate resulted in error code 0x80070643. The cause was that the Windows Recovery partition was not big enough. Recreating a larger Windows Recovery partition solved the issue. The references helped me are as follows:
Wednesday, September 20, 2023
Setting up Conda Virtual Environment for Tensorflow
These steps are for create a Python virtual environment for running Tensorflow on GPU. The steps work on Fedora Linux 38 and Ubuntu 22.04 LTS:
To install miniconda, we can do as a regular user:
curl -s "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" | bash
Following that, we create a conda virtual environment for Python.
# create conda virtual environment
conda create -n tf213 python=3.11 pip
# activate the environment in order to install packages and libraries
conda activate tf213
#
# the following are from Tensorflow pip installation guide
#
# install CUDA Toolkit
conda install -c conda-forge cudatoolkit=11.8.0
# install python packages
pip install nvidia-cudnn-cu11==8.6.0.163 tensorflow==2.13.*
#
# setting up library and tool search paths
# scripts in activate.d shall be run when the environment
# is being activated
#
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
# get CUDNN_PATH
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
# set LD_LIBRARY_PATH
echo 'export LD_LIBRARY_PATH=$CUDNN_PATH/lib:$CONDA_PREFIX/lib/:$LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
# set XLA_FLAGS (for some systems, without this, it will lead to a 'libdevice not found at ./libdevice.10.bc' error
echo 'export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
To test it, we can run
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
Enjoy!
Monday, September 18, 2023
Mounting File Systems in a Disk Image on Linux
On Linux systems, we can create disk image using the dd command.
This post lists the steps to mount file systems, in particular, LVM volumes in an image of a whole disk,
which is often created as follows,
dd if=/dev/sdb of=/mnt/disk1/sdb.img bs=1M status=progress
Assuming the disk has multiple partitions, how do we mount the file systems on these partitions? The following are the steps,
# 1. mount the disk where the disk image is
# we assume the disk is /dev/sdb1, and we mount
# it on directory win
sudo mount /dev/sdb1 win
# 2. map the partitions to loopback devices
# here we assume the disk image is win/disks/disk1.img
sudo losetup -f -P win/disks/disk1.img
# 3. list the LVM volumes
sudo lvdisplay
# 4. suppose from the input of the above command,
# the volumne is shown as /dev/mylvm/lvol0,
# and we want it mounted on directory lvol0
sudo mount /dev/mylvm/lvol0 lvol0
# 5. do something we want ...
# 6. unmount the volume
sudo umount lvol0
# 7. deactivate LVM volume
# we can query, confirm the volume group by
# vgdisplay
sudo vgchange -a n mylvm
# 8. detatch the loopback device
# assuming the device is /dev/loop0
sudo losetup -d /dev/loop0
# 9. umount the disk
sudo umount win
Sunday, September 17, 2023
Mounting ZFS Dataset as /home
The following steps work:
# list ZFS pools and datasets
zfs list
# Query current mount point for a ZFS dataset, e.g., mypool/mydataset
zfs get mountpoint mypool/mydataset
# Set new mountpoint to /home
zfs set mountpoint=/home mypool/mydataset
# Always verify
zfs list
zfs get mountpoint mypool/mydataset
Wednesday, August 16, 2023
Bus Error (Core Dumped)!
I was training a machine learning model written in PyTorch on a Linux system. During the training, I encountered "Bus error (core dumped)." This error produces no stack trace. Eventually, I figured it out that this was resulted in the exhaustion of shared memory whose symptom is that "/dev/shm" is full.
To resolve this issue, I simply double the size of "/dev/shm", following the instruction given in this Stack Overflow post,
Basically, it is to edit the /etc/fstab file. If the file already has an entry for /dev/shm, we simply increase its size. If not, we add a line to the file, such as
none /dev/shm tmpfs defaults,size=32G 0 0
To bring it to effect, we remount the file system, as in,
sudo mount /dev/shm
Wednesday, July 19, 2023
Terminal Multiplexer
This is just a note for the number of Terminal Multiplexer that are out there:
- screen
- tmux
- byobu
- tmuxinator
Some resources that I find useful
- https://www.baeldung.com/linux/screen-command
- https://www.redhat.com/sysadmin/introduction-tmux-linux
- https://www.digitalocean.com/community/tutorials/how-to-install-and-use-byobu-for-terminal-management-on-ubuntu-16-04
- https://askubuntu.com/questions/136776/when-using-byobu-in-a-putty-session-cannot-create-new-windows
- https://stackoverflow.com/questions/18980222/should-do-i-use-screen-or-tmux-commands
- https://superuser.com/questions/236158/tmux-vs-screen
- https://superuser.com/questions/423310/byobu-vs-gnu-screen-vs-tmux-usefulness-and-transferability-of-skills
Thursday, March 30, 2023
Binding Process to TCP/UDP Port Failure on Windows
Windows has the concept of reserved TCP/UDP ports. These ports can nonetheless be used by any other application. These can be annoying because the reserved ports
would not shown be used when we query used ports using netstat. For instance, if we want to bind TCP port 23806 to an application, we determine the
availability using the netstat command, such as
C:> netstat -anp tcp | find ":23806"
C:>
The output is blank, which means that the port is unused. However, when we attempt to bind the port to a process of our choice, we encounter an error, such as
bind [127.0.0.1]:23806: Permission denied
This is annoying. The reason is that the port somehow becomes a reserved port. To see this, we can query reserved ports, e.g.,
C:> netsh int ipv4 show excludedportrange protocol=tcp
Protocol tcp Port Exclusion Ranges
Start Port End Port
---------- --------
1155 1254
... ...
23733 23832
23833 23932
50000 50059 *
* - Administered port exclusions.
C:>
which shows that 23806 is now a served port. What is really annoying is that the range can be updated by Windows dynamically. There are several methods to deal with this.
- Method 1. Stop and start the Windows NAT Driver service.
After this, query the reserved the ports again. It is often the reserved ports are much limited when compared to before, e.g.,net stop winnat net start winnatC:>netsh int ipv4 show excludedportrange protocol=tcp Protocol tcp Port Exclusion Ranges Start Port End Port ---------- -------- 2869 2869 5357 5357 50000 50059 * * - Administered port exclusions. C:> - Method 2. If you don't wish to use this feature of Windows, we can disable reserved ports.
reg add HKLM\SYSTEM\CurrentControlSet\Services\hns\State /v EnableExcludedPortRange /d 0 /f