Notes of a Programmer: GPU

Showing posts with label GPU. Show all posts

Monday, March 3, 2025

Selecting CUDA Devices

I observed that when I run a Pytorch program on a system with GPUs, the Pytorch runner dispatches the computational tasks to both GPUs. Since the program is not optimized for using multiple GPUs, the performance using the two GPU is worse than just using one. A simple method to address this turns out to be that we inform Pytorch to use a designated GPU via environmental variable CUDA_VISIBLE_DEVICES.

For instance, to run a task run_task.sh, we can

CUDA_VISIBLE_DEVICES=0 ./run_task.sh SEED=1234

which results in running the task on a single GPU.

For the non-optimized program, I got much better computational efficiency by doing than letting each run on two GPUs:

CUDA_VISIBLE_DEVICES=0 ./run_task.sh SEED=1234

CUDA_VISIBLE_DEVICES=1 ./run_task.sh SEED=4321

Sunday, March 26, 2023

Installing GPU Driver for PyTorch and Tensorflow

To use GPU for PyTorch and Tensorflow, a method I grow fond of is to install GPU driver from RPM fusion, in particular, on Debian or Fedora systems where only free packages are included in their repositories. Via this method, we only install the driver from RPM fusion, and use Python virtual environment to bring in CUDA libraries.

Configure RPM Fusion repo by following the instruction, e.g., as follows:


    sudo dnf install https://mirrors.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm https://mirrors.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm

Install driver, e.g.,
```
    sudo dnf install akmod-nvidia
    
```

Add CUDA support, i.e.,


    sudo dnf install xorg-x11-drv-nvidia-cuda

Check driver by running nvidia-smi. If it complains about not being able to connect to the driver, reboot the system.

If we use PyTorch or Tensorflow only, there is need to install CUDA from Nvidia.

Reference

https://rpmfusion.org/Configuration

Saturday, January 21, 2023

Verifying Cuda Installation

For full CUDA installation, we can verify it via the following steps


  # check driver is installed
  cat /proc/driver/nvidia/version
  
  # check the version of CUDA Kit
  CUDA_PATH=/usr/local/cuda
  ${CUDA_PATH}/bin/nvcc --version
  
  # run deviceQuery demo program
  ${CUDA_PATH}/extras/demo_suite/deviceQuery
  
  # run bandwidhtTest demo program
  ${CUDA_PATH}/extras/demo_suite/bandwidthTest
  
  # run busGrind demo program
  ${CUDA_PATH}/extras/demo_suite/busGrind
  
  # run vectorAdd demo program
  ${CUDA_PATH}/extras/demo_suite/vectorAdd
  
  # finally, run sample programs from Nvidia
  git clone https://github.com/NVIDIA/cuda-samples
  cd cuda-samples
  make