-
Notifications
You must be signed in to change notification settings - Fork 2
Deeplearning: How to install tensorflow gpu for deeplearning processes in BrainVISA
You can find below the steps I followed to be able to use tensorflow-gpu 2.6.2 in a BrainVISA development container.
I was created using a host that runs Ubuntu 16.04 with a GPU NVidia RTX A5000.
In this wiki, we suppose that a driver supporting the GPU card was installed (in my case I needed to manually download and install the driver 470.94 from NVidia website because I did not found a repository for Ubuntu 16.04 that distributed the driver).
On Ubuntu 16.04 with driver 470.94, /dev/nvidia-uvm was missing
To check, run the following script in a bash shell:
cat >/tmp/check_cuda_node <<EOF
#!/bin/bash
/sbin/modprobe nvidia
if [ "\$?" -eq 0 ]; then
# Count the number of NVIDIA controllers found.
NVDEVS=\`lspci | grep -i NVIDIA\`
N3D=\`echo "\$NVDEVS" | grep "3D controller" | wc -l\`
NVGA=\`echo "\$NVDEVS" | grep "VGA compatible controller" | wc -l\`
N=\`expr \$N3D + \$NVGA - 1\`
for i in \`seq 0 \$N\`; do
mknod -m 666 /dev/nvidia\$i c 195 \$i
done
mknod -m 666 /dev/nvidiactl c 195 255
else
exit 1
fi
/sbin/modprobe nvidia-uvm
if [ "\$?" -eq 0 ]; then
# Find out the major device number used by the nvidia-uvm driver
D=\`grep nvidia-uvm /proc/devices | awk '{print \$1}'\`
mknod -m 666 /dev/nvidia-uvm c \$D 0
else
exit 1
fi
EOF
chmod +x /tmp/check_cuda_node
sudo /tmp/check_cuda_node
rm -f /tmp/check_cuda_node
nvidia-container-cli command is used by bv bash
to properly set GPU options in singularity container.
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt install nvidia-container-toolkit
In some cases, it seems that tensorflow fails during initialization when persistent mode is disabled.
sudo nvidia-smi -pm 1
cf How to setup brainvisa distro with writable image
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
cat >/etc/apt/sources.list.d/cuda.list <<EOF
deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /
EOF
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt update
sudo apt install --no-install-recommends \
cuda-libraries-11-4 \
libcudnn8=8.2.4.15-1+cuda11.4 \
libnvinfer8=8.2.3-1+cuda11.4 \
libnvinfer-plugin8=8.2.3-1+cuda11.4
pip3 install --user Keras==2.6.0 # !!! Be aware that Keras version must match tensorflow.keras version !!!
pip3 install --user tensorflow-gpu==2.6.2