I was training a machine learning model written in PyTorch on a Linux system. During the training, I encountered "Bus error (core dumped)." This error produces no stack trace. Eventually, I figured it out that this was resulted in the exhaustion of shared memory whose symptom is that "/dev/shm" is full.
To resolve this issue, I simply double the size of "/dev/shm", following the instruction given in this Stack Overflow post,
Basically, it is to edit the /etc/fstab file. If the file already has an entry for /dev/shm, we simply increase its size. If not, we add a line to the file, such as
none /dev/shm tmpfs defaults,size=32G 0 0
To bring it to effect, we remount the file system, as in,
sudo mount /dev/shm
No comments:
Post a Comment