Skip to content

Deeplearning: How to install tensorflow gpu for deeplearning processes in BrainVISA

Nicolas Souedet edited this page Jul 1, 2022 · 3 revisions

You can find below the steps I followed to be able to use tensorflow-gpu 2.6.2 in a BrainVISA development container.

I was created using a host that runs Ubuntu 16.04 with a GPU NVidia RTX A5000.

In this wiki, we suppose that a driver supporting the GPU card was installed (in my case I needed to manually download and install the driver 470.94 from NVidia website because I did not found a repository for Ubuntu 16.04 that distributed the driver).

1. On the host

1.1. Check that all /dev/nvidia* are correctly created

On Ubuntu 16.04 with driver 470.94, /dev/nvidia-uvm was missing

To check, run the following script in a bash shell:

cat >/tmp/check_cuda_node <<EOF
#!/bin/bash

/sbin/modprobe nvidia

if [ "\$?" -eq 0 ]; then
  # Count the number of NVIDIA controllers found.
  NVDEVS=\`lspci | grep -i NVIDIA\`
  N3D=\`echo "\$NVDEVS" | grep "3D controller" | wc -l\`
  NVGA=\`echo "\$NVDEVS" | grep "VGA compatible controller" | wc -l\`

  N=\`expr \$N3D + \$NVGA - 1\`
  for i in \`seq 0 \$N\`; do
    mknod -m 666 /dev/nvidia\$i c 195 \$i
  done

  mknod -m 666 /dev/nvidiactl c 195 255

else
  exit 1
fi

/sbin/modprobe nvidia-uvm

if [ "\$?" -eq 0 ]; then
  # Find out the major device number used by the nvidia-uvm driver
  D=\`grep nvidia-uvm /proc/devices | awk '{print \$1}'\`

  mknod -m 666 /dev/nvidia-uvm c \$D 0
else
  exit 1
fi
EOF

chmod +x /tmp/check_cuda_node
sudo /tmp/check_cuda_node
rm -f /tmp/check_cuda_node

1.2. Setup cuda-container-toolkit to get the nvidia-container-cli command

nvidia-container-cli command is used by bv bash to properly set GPU options in singularity container.

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
   && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt install nvidia-container-toolkit

1.3. Optionally, enable cuda persistent mode

In some cases, it seems that tensorflow fails during initialization when persistent mode is disabled.

sudo nvidia-smi -pm 1

1.4. Setup a new brainvisa distro based on singularity writable image

cf How to setup brainvisa distro with writable image

2. In the singularity writable image (running in writable mode with root privileges)

2.1. Install cuda libraries

sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
cat >/etc/apt/sources.list.d/cuda.list <<EOF
deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /
EOF

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600

sudo apt update

sudo apt install --no-install-recommends \
	cuda-libraries-11-4 \
	libcudnn8=8.2.4.15-1+cuda11.4 \
	libnvinfer8=8.2.3-1+cuda11.4 \
	libnvinfer-plugin8=8.2.3-1+cuda11.4

2.2. Install tensorflow-gpu using pip3

pip3 install --user Keras==2.6.0 # !!! Be aware that Keras version must match tensorflow.keras version !!!
pip3 install --user tensorflow-gpu==2.6.2