Make a note of how to install the NVIDIA driver (version 450)
, the CUDA 11 compatible CUDA Toolkit
and the cuDNN SDK 8.0.4
on Ubuntu 20.04LTS. The purpose is to run TensorFlow.
https://www.tensorflow.org/install/gpu
nouveau
Immediately after installation, the OSS nouveau
driver is loaded.
$ lsmod | grep nouveau
nouveau 1949696 1
mxm_wmi 16384 1 nouveau
video 49152 1 nouveau
ttm 106496 2 drm_vram_helper,nouveau
drm_kms_helper 184320 4 ast,nouveau
i2c_algo_bit 16384 2 ast,nouveau
drm 491520 8 drm_kms_helper,drm_vram_helper,ast,ttm,nouveau
wmi 32768 2 mxm_wmi,nouveau
Since the NVIDIA driver is required to use CUDA
, list nouveau
in the blacklist
and remove it from the initramfs so that the NVIDIA driver can be used. Make sure nouveau
is not loaded after a reboot.
$ sudo echo "blacklist nouveau" >> /etc/modprobe.d/blacklist-nouveau.conf
$ sudo echo "options nouveau modeset=0" >> /etc/modprobe.d/blacklist-nouveau.conf
$ sudo update-initramfs -u
update-initramfs: Generating /boot/initrd.img-5.8.0-36-generic
$ sudo reboot
$ lsmod | grep nouveau
$
Next, check the version of the driver distributed by ubuntu.
$ ubuntu-drivers devices
== /sys/devices/pci0000:5d/0000:5d:00.0/0000:5e:00.0 ==
modalias : pci:v000010DEd00001DB4sv000010DEsd00001214bc03sc02i00
vendor : NVIDIA Corporation
model : GV100GL [Tesla V100 PCIe 16GB]
driver : nvidia-driver-450 - distro non-free
driver : nvidia-driver-390 - distro non-free
driver : nvidia-driver-460 - distro non-free recommended
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-440-server - distro non-free
driver : nvidia-driver-418-server - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin
$
$
$ sudo apt info nvidia-driver-450 | grep -i version
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
Version: 450.102.04-0ubuntu0.20.04.1
$
$
$ sudo apt info nvidia-driver-450-server | grep -i version
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
Version: 450.80.02-0ubuntu0.20.04.3
$
I checked the NVIDIA website and found that 450.80.02 is distributed, so I decided to install nvidia-driver-450-server
which can install this version.
Install the driver and check the driver startup with nvidia-smi
after rebooting.
$ sudo apt install nvidia-driver-450-server
$ sudo reboot
$
$ lsmod | grep nvidia
nvidia_uvm 1003520 0
nvidia_drm 49152 0
nvidia_modeset 1183744 1 nvidia_drm
nvidia 19718144 2 nvidia_uvm,nvidia_modeset
drm_kms_helper 217088 5 drm_vram_helper,ast,nvidia_drm
drm 552960 7 drm_kms_helper,drm_vram_helper,ast,drm_ttm_helper,nvidia_drm,ttm
$
$ nvidia-smi
Fri Jan 8 16:11:05 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:5E:00.0 Off | 0 |
| N/A 33C P0 37W / 250W | 0MiB / 16160MiB | 4% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
$
CUDA Toolkit
Next, I install CUDA Toolkit
, but I haven't distributed 11.0 on Ubuntu yet.
$ sudo apt info nvidia-cuda-toolkit | grep -i version
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
Version: 10.1.243-3
$
Go to the NVIDIA website, select Ubuntu 20.04 and follow the installation steps that appear.
The installation command must specify the version, such as cuda-11-0
. When I ran it without it, cuda 11.2 was installed.
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
$ sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
$ sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
$ sudo apt-get update
$ sudo apt-get install cuda-11-0
$ sudo reboot
After rebooting, check if the CUDA Toolkit
is installed properly with the nvcc -V
command. It is said that it is not included, but it seems that the pass does not pass, so I will pass it.
$ nvcc -V
Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit
$ /usr/local/cuda/bin/nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0
$
$ echo "export PATH="/usr/local/cuda/bin:$PATH" >> /etc/bash.bashrc
cuDNN
Download the cuDNN SDK 8.0.4
from the NVIDIA website. An NVIDIA developer account (free registration) is required to download.
There is no download for Ubuntu 20.04, so download cuDNN Library for Linux (x86_64). When I unzip it, there are header files and libraries in the two folders, but there is no indication of the copy destination. The txt file is the license agreement. .. ..
$ ls
include lib64 NVIDIA_SLA_cuDNN_Support.txt
$
$ ls include/
cudnn_adv_infer.h cudnn_cnn_infer.h cudnn_ops_infer.h
cudnn_adv_train.h cudnn_cnn_train.h cudnn_ops_train.h
cudnn_backend.h cudnn.h cudnn_version.h
$
$ ls lib64/
libcudnn_adv_infer.so libcudnn_cnn_train.so.8.0.4
libcudnn_adv_infer.so.8 libcudnn_ops_infer.so
libcudnn_adv_infer.so.8.0.4 libcudnn_ops_infer.so.8
libcudnn_adv_train.so libcudnn_ops_infer.so.8.0.4
libcudnn_adv_train.so.8 libcudnn_ops_train.so
libcudnn_adv_train.so.8.0.4 libcudnn_ops_train.so.8
libcudnn_cnn_infer.so libcudnn_ops_train.so.8.0.4
libcudnn_cnn_infer.so.8 libcudnn.so
libcudnn_cnn_infer.so.8.0.4 libcudnn.so.8
libcudnn_cnn_train.so libcudnn.so.8.0.4
libcudnn_cnn_train.so.8 libcudnn_static.a
$
When I googled, I immediately found the Official Document and was instructed to specify the copy destination and change the file permissions. I will also pass the pass.
$ sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
$ echo 'export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"' >> /etc/bash.bashrc
$ echo 'export LD_LIBRARY_PATH="/usr/lib/cuda/include:$LD_LIBRARY_PATH"' >> /etc/bash.bashrc
It seems to compile mnist CUDNN
to check if the installation was successful. It seems that the necessary files are in the cuDNN Code Samples and User Guide ~
, so download the deb
file for Ubuntu 18.04 and unzip it.
$ mkdir libcudnn8-samples
$ dpkg-deb -x libcudnn8-samples_8.0.4.30-1+cuda11.0_amd64.deb libcudnn8-samples
$
$ cd libcudnn8-samples/usr/src/cudnn_samples_v8/mnistCUDNN
$ make clean && make
$ ./mnistCUDNN
mnistCUDNN
Executing: mnistCUDNN
cudnnGetVersion() : 8004 , CUDNN_VERSION from cudnn.h : 8004 (8.0.4)
Host compiler version : GCC 9.3.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 80 Capabilities 7.0, SmClock 1380.0 Mhz, MemSize (Mb) 16160, MemClock 877.0 Mhz, Ecc=1, boardGroupID=0
Using device 0
...
...
0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
$
The compilation was successful. This completes the CUDA settings.