Introduction

This time I set up to use GPU with tensorflow, so I will leave it as a memo. It was a little difficult. .. .. I hope it will be helpful for those in need.

environment

ubuntu : 18.04.5 LTS
Graphics: GeForce RTX 2070 SUPER
nvidia-driver: 455.45.01
cuda: 10.0
cudnn: 7.4.2
tensorflow-gpu : 2.0.0

procedure

At first, the direction was to install the driver and proceed with the installation of cuda and cudnn. When I installed the nvidia driver and then cuda (10.0 or 10.1), the driver was not recognized. The reason I changed cuda to 10.0 or 10.1 this time is because I want to run gpu with tensorflow, and the latest build confirmed was around 10.0 or 10.1.

So it's the order to install cuda and nvidia-driver. However, it didn't work here again ... After installing nvidia-driver in this order and restarting, the mouse and keyboard cannot be used. ..

What I did after all ➀ nvidia-driver installation ➁ Turn off nvidia-driver once ➂ Install cuda ➃ Install nvidia-driver again ➄ Install cudnn

It will be. There seems to be an absolutely easy method, but this time I was able to do it for the time being.

-GPU correspondence table of tensorflow

1. Before installing NVIDIA-driver

Install vim (because I personally want to use vim)

$ sudo apt upgrade
$ sudo apt update
$ sudo apt install vim

I want to use jj with vim, so edit ~/.vimrc.

$ vim ~/.vimrc

`~/.vimrc`


set number
inoremap<silent> jj <ESC>

(1) Disable Nouveau

First, disable Nouveau. When it comes to Nvidia graphics cards, a driver called Nouveau is set by default, so add Nouveau to the blacklist.

Creating a blacklist

$ sudo vim /etc/modprobe.d/blacklist-nouveau.conf

`/etc/modprobe.d/blacklist-nouveau.conf`


blacklist nouveau
options nouveau modeset=0

Execute the following command and confirm that nouveau is disabled

OK if the display resolution is low

$ sudo update-initramfs -u
$ sudo reboot

(2) Fixing the kernel

If you do not fix the kernel version of the nvidia driver, it seems that the dependency with the driver may be broken when you upgrade. So, fix the kernel.

Install aptitude

$ sudo apt install aptitude

Check kernel version

$ aptitude show linux-generic

Write the contents confirmed above to the following file (rewrite only version)

$ cd /etc/apt/preferences.d
$ sudo vim linux-kernel.pref

`linux-kernel.pref`


Package: linux-generic
Pin: version 4.15.0.128.115
Pin-Priority: 1001

Package: linux-headers-generic
Pin: version 4.15.0.128.115
Pin-Priority: 1001

Package: linux-image-generic
Pin: version 4.15.0.128.115
Pin-Priority: 1001

That's all for fixing the kernel.

2. Install the NDIVIA driver

Check if nouveau is disabled

$ lsmod | grep -i nouveau

Install all the tools required for development such as gcc and make

$ sudo apt install build-essential

Add repository

$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt update

Shows installable devices

$ ubuntu-drivers devices

Select the driver you want to install and install

$ sudo apt install nvidia-driver-455
$ sudo reboot

Check if the driver is installed

$ nvidia-smi

3. Remove NVIDIA driver

Check the installed nvidia driver. (Erase everything)

$ dpkg -l | grep nvidia-*

Delete

$ sudo apt-get --purge remove nvidia-*
$ sudo apt-get --purge remove libnvidia-*
$ sudo apt-get --purge remove libnvidia-compute-455:i386 
$ sudo apt-get --purge remove  libnvidia-fbc1-455:i386

If nothing is displayed with the following command, it is okay

$ dpkg -l | grep nvidia

4. Install CUDA

Please install CUDA from here. In the case of tensorflow, the version is strict, so please check the correspondence table firmly.

$ sudo dpkg -i cuda-repo-ubuntu1804-10-0-local-10.0.130-410.48_1.0-1_amd64.deb
$ sudo apt-key add /var/cuda-repo-10-0-local-10.0.130-410.48/7fa2af80.pub
$ sudo apt-get update
$ sudo apt-get install cuda

PATH to ~/.bashrc

`~/.bashrc`



export PATH="/usr/local/cuda-10.0/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH"

$ source ~/.bashrc

* Remove the NVIDIA driver as it will be installed when you install CUDA. (In this case 410 was installed)

$ sudo apt-get --purge remove nvidia-*
$ sudo apt-get --purge remove libnvidia-*
$ sudo apt-get --purge remove libnvidia-compute--410:i386
$ sudo apt-get --purge remove libnvidia-fbc1-410:i386

It's okay if nothing is displayed with the following command

$ dpkg -l | grep nvidia*

5. Install NVIDIA driver

Finally install the nvidia driver here

$ sudo apt install nvidia-driver-455
$ sudo reboot

Checking the driver and CUDA

$ nvidia-smi
$ nvcc -V

In nvidia -smi, CUDA is displayed as 11.1, but please be careful because the version displayed by nvcc -V is the actual version. (It was so complicated that I stumbled here ...)

6. Install cudnn

Registration is required to install cudnn. Download cudnn for your version of CUDA. Install cudnn here

$ sudo dpkg -i libcudnn7_7.4.2.24-1+cuda10.0_amd64.deb 
$ sudo dpkg -i libcudnn7-dev_7.4.2.24-1+cuda10.0_amd64.deb 
$ sudo dpkg -i libcudnn7-doc_7.4.2.24-1+cuda10.0_amd64.deb
$ tar xvf cudnn-10.0-linux-x64-v7.4.2.24.tgz
$ sudo cp -a cuda/include/cudnn.h /usr/local/cuda/include/
$ sudo cp -a cuda/lib64/libcudnn* /usr/local/cuda/lib64/
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
$ sudo reboot

Verification

$ cp -r /usr/src/cudnn_samples_v7/ $HOME
$ cd $HOME/cudnn_samples_v7/mnistCUDNN
$ make clean && make
$ ./mnistCUDNN

OK if Test passed! Is displayed

+α

Install Tensorflow

Can be installed with pip

$ pip install tensorflow-gpu==2.0.0

Check if GPU is available

This command will display the recognized CPU and GPU

from tensorflow.python.client import device_lib
device_lib.list_local_devices()

If TRUE is displayed with the following command, it is okay

import tensorflow as tf
tf.test.is_gpu_available()

-Check for recommended drivers

GPU settings with Docker

There seems to be a way to configure the GPU using Docker. I haven't tried it yet, but this one seems to be easier.

-How to build a deep learning GPU learning environment with Docker

At the end

After all, GPU setting of tensorflow is troublesome, isn't it? I hope it will be helpful for those who will do it in the future.

NVIDIA-driver (GeForce RTX 2070 SUPER), cuda 10.0, cudnn 7.4.2 settings on Ubuntu 18.04.5 LTS (tensorflow-gpu: 2.0.0 version)