Monday, April 21, 2025

Monitoring transient network traffic session

Sometimes there is a need to investigate network traffic that is transient. To make the problem clearer, let's examine this example. The firewall indicates some network traffic was blocked:


Block IPv4 link-local (1000000102) 192.168.99.99:35018 169.254.169.254:80 TCP:S

We want to figure out which process that sent out the packets. So, we would do something like


sudo netstat -anp | grep 35018

Unfortunately, this yields nothing because at the time we issue the netstat command port 35018 is not open. It turns out the network traffic is short-lived. How do we figure out which process sends out the packets? Of course, we could try to capture the packets:


tcpdump -XX -i any host 169.254.169.254 and port 80

which indeed captures the packets, and also shows the header and content of the packets captured. Sometimes, the packet header and the content are sufficiently for us to figure out what progress sent out the packets. However, what if the packet header and the content do not offer a clue?

It turns out, we can use sysdig, for instance, we can use it in this way:


sysdig -p '*%evt.num  %evt.time   %evt.cpu   %proc.name   (%thread.tid %proc.ppid)   %evt.dir %evt.type %evt.info' fd.rip=169.254.169.254 and fd.rport=80

which tells us the process that sent out the packets and the parent process PID. The process that sent out the packets may have gone, but it is offen that the parent process is still around. This solves us the problem because it offers a way to investigate further.

Tuesday, March 11, 2025

Installing TexLive Packages Using tlmgr from a Non-default Repository

From time to time, the default TexLive repository does not work for me when I try to install a package using tlmgr. One method to get around this is to use a non-default repository, e.g., to install the listings package, we can


tlmgr -repository https://mirrors.ibiblio.org/pub/mirrors/CTAN/systems/texlive/tlnet install listings

Perhaps, the trick part is not to find a mirror, rather it is to write the correct URL. This example serves as a template for that

Monday, March 3, 2025

Selecting CUDA Devices

I observed that when I run a Pytorch program on a system with GPUs, the Pytorch runner dispatches the computational tasks to both GPUs. Since the program is not optimized for using multiple GPUs, the performance using the two GPU is worse than just using one. A simple method to address this turns out to be that we inform Pytorch to use a designated GPU via environmental variable CUDA_VISIBLE_DEVICES.

For instance, to run a task run_task.sh, we can

CUDA_VISIBLE_DEVICES=0 ./run_task.sh SEED=1234

which results in running the task on a single GPU.

For the non-optimized program, I got much better computational efficiency by doing than letting each run on two GPUs:

CUDA_VISIBLE_DEVICES=0 ./run_task.sh SEED=1234

CUDA_VISIBLE_DEVICES=1 ./run_task.sh SEED=4321

Friday, February 21, 2025

Enabling NAT and IP Masquerading on Rocky Linux 9

This is a note about enabling NAT (SNAT, more precisely) and IP masquerading on a Linux host that runs Rocky Linux 9. The host has two network interfaces: eth0 and wg0. Interface eth0 connects to the outside network and is assigned an public IP address while interface wg0 is on a private network. The objective is to make the Linux host as router for the private network so that the traffic originated from the private network can go to the outside network. The steps to achieve this objective using firewalld are as follows:

Enable IPv4 forwarding

      echo "net.ipv4.ip_forward = 1" | sudo tee -a /etc/sysctl.conf
      sudo sysctl -p

Assign interface eth0 to the external zone

     firewall-cmd --permanent --zone=external --change-interface=eth0

Assign interface wg0 to the internal zone

     firewall-cmd --permanent --zone=internal --change-interface=wg0

Set the zone target of the internal zone to ACCEPT

     firewall-cmd --permanent --zone=internal --set-target=ACCEPT

Finally, reload firewalld's configuration.
firewall-cmd --reload

There is no need to meddle with anything else, such as adding nftables rules and set masquerading for the outward facing network interface. This is because the external zone is by default with masqerading enabled. This can be verified by

firewall-cmd --zone=external --query-masquerade

or by looking at the zone definition file at /usr/lib/firewalld/zones/external.xml.

In addition, the external zone's is also enabled to forward packets. We can examine this by looking at the zone definition file at /usr/lib/firewalld/zones/external.xml or by

firewall-cmd --zone=external --query-forward

The issue seems to lie at the zones' targets. First, let's view the zones' configuaration::

firewall-cmd --zone=external --list-all

Of course, we can also just check the target:

firewall-cmd --permanent --zone=external --get-target

firewall-cmd --zone=internal --list-all

Of course, we can also just check the target:

firewall-cmd --permanent --zone=internal --get-target

The targets of the both external and internal zones are both originally default. The internal zone's default target is in fact interpreted as reject, thus, preventing from packet forwarding to the outside network. This is explained as

For a forwarded packet that ingresses zoneA and egresses zoneB:

if zoneA's target is ACCEPT, DROP, or REJECT then the packet is accepted, dropped, or rejected respectively.

if zoneA's target is default, then the packet is accepted, dropped, or rejected based on zoneB's target. If zoneB's target is also default, then the packet will be rejected by firewalld's catchall reject.

Since both ingress (internal) and egress (external) are both "default", the result is that the internal zone's target becomes REJECT".

One question, I have in mind is, why do I not assign the internal facing interface to the trusted zone? That might be for another day.

Reference

This note benefited tremendously from the following resources:

https://askubuntu.com/questions/1463093/what-is-target-default-of-a-zones-configuration-in-firewalld
https://github.com/firewalld/firewalld/issues/590#issuecomment-605200548
man firewall-cmd
man firewalld.zone
man firewalld

Wednesday, February 19, 2025

Runing dnf package manager on Linux with small memory

Running dnf package manager can sometimes be difficult on Linux hosts with small memory. I observed on a Rocky Linux 9 with 1 GB RAM after enabled epel, and dnf install would sometimes be killed due to OOM.

To address this issue, we can create and enable a swap space:

$ sudo dd if=/dev/zero of=/swapfile count=1024 bs=1MiB
$ sudo chmod 600 /swapfile
$ sudo mkswap /swapfile
$ sudo swapon /swapfile
$ sudo dnf update

Once done, we then turn off the swap space:

$ sudo swapoff /swapfile

Reference

This idea come from this Stackoverflow post.

Thursday, December 12, 2024

Solution for problem: rootless Docker container cannot ping outside networks

I am running a rootless docker container on a Ubuntu host (24.04 LTS). However, I cannot ping the host where the container is running and the outside network. The workaround I created are two steps:

Run the container with the --privileged option, as in
```
docker container run --privileged 
```
On the host where the container is running, set Linux kernel parameber `net.ipv4.ping_group_range` to include the group id that runs the container. For instance, if the group id of the user that runs the container is 3000, we can set the parameter as follows:
```
echo "3000 3000" > /proc/sys/net/ipv4/ping_group_range
```

If tests indicate that pings are successful in the container, we can set the kernel parameter through a configuration file so that the setting can survive reboot, e.g.,

On the host that the container is running, create a file, e.g., /etc/sysctl.d/99-ping-group-range.conf as in:
```
echo "net.ipv4.ping_group_range=3000 3000" \
       > /etc/sysctl.d/99-ping-group-range.conf
```

The idea of these is from

Wednesday, October 2, 2024

SSH Publication Key Authentication Fails When Home is on NFS

As the title stated, regardless how I try, I couldn't get SSH publication key authentication to work for a Linux host. It turns out that the Linux host that runs the SSH server has SELinux enabled. To make public key authnentication work for SSH, we simply need to configure SELinux, i.e.,


sudo setsebool -P use_nfs_home_dirs 1