I wanted to try creating a GPU container, but I didn't have the option to buy a GPU for that, so I created a GPU instance on the cloud and built a GPU container there. Since it took time other than that, I will summarize the procedure and commands as a memo for myself.
-Create a Google Cloud account and put it on the screen of the GCP console. -Upgrade. (If you are in a free trial, you cannot apply for GPU allocation)
[Host OS] ・ Ubuntu 20.04 LTS (on GCP) ・ Nvidia-driver 460. ・ Docker 19.03.14 ・ Nvidia-driver2 2.5.0
When creating a VM instance with a GPU, you need to apply for quota only for the first time.
First, select IAM and Management> Assignment
from the GCP console. ,
It is necessary to apply for two, [Number of allocations in total] and [Number of allocations for each GPU region].
** [Number of allocations in total] ** Enter GPUs in the filter and select the service shown below.
Check globally and click "Edit Assignment". Enter the upper limit and reason and proceed to the next.
On the next screen, enter your [Name], [Email Address], and [Telephone Number] to send the request. Approval will be given in about 5 minutes, so next create a GPU instance.
** [Number of allocations for each GPU region] **
Filter appropriately and choose NVIDIA T4
this time.
Since the allocation is in region units, select the region where you plan to set up a GPU instance, and then specify the upper limit of the number of GPUs. Since the numerical value specified here is the upper limit, no charge will be incurred at this point. This time, one for the purpose of building a GPU container.
Select Compute Engine> VM Instance
from the GCP console.
Click [Create] and create with the following specifications.
● Machine configuration Machine family: GPU Series: N1 Machine type: n1-standard-1 (1vCPU, 3.75GB memory)
● Firewall Allow HTTP traffic Allow HTTPS traffic
It took a few minutes to start. .. ..
** Check server environment **
OS check
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.1 LTS"
Confirmation of GPU (made by NVIDIA)
$ lspci -vv | grep -i nvidia
00:04.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4](rev a1)
Subsystem: NVIDIA Corporation TU104GL [Tesla T4]
Kernel modules: nvidiafb
** Disable nouveau graphics driver ** It seems that it is necessary to disable the nouveau graphics driver that is included in linux by default.
python
$ lsmod | grep -i nouveau
** Package management tool update **
python
$ sudo apt update
$ sudo apt upgrade
You can check the Driver version of NVIDIA from this site. https://www.nvidia.co.jp/Download/index.aspx?lang=jp
** Add repository and update **
python
$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt update
** Check the recommended installation driver **
python
$ sudo apt ubuntu-drivers-common
$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:04.0 ==
modalias : pci:v000010DEd00001EB8sv000010DEsd000012A2bc03sc02i00
vendor : NVIDIA Corporation
model : TU104GL [Tesla T4]
driver : nvidia-driver-418-server - distro non-free
driver : nvidia-driver-460 - distro non-free recommended
driver : nvidia-driver-440-server - distro non-free
driver : nvidia-driver-450 - distro non-free
driver : nvidia-driver-450-server - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin
** Driver installation **
python
$ sudo apt install nvidia-driver-460
** Reboot and check if the installation was successful **
Confirmation of GPU (made by NVIDIA)
$ lspci -vv | grep -i nvidia
00:04.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4](rev a1)
Subsystem: NVIDIA Corporation TU104GL [Tesla T4]
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nvidia_drm, nvidia
nvidia-Checking the operation of the smi command
$ nvidia-smi
[Systemctl cannot be used on Ubuntu in Docker container] Refer to step 1.
docker-ce = 5: 19.03.14 ~ 3-0 ~ ubuntu-focal
Install the packages required to create and launch a GPU container.
** Add repository and update ** https://nvidia.github.io/nvidia-docker
python
#GPG key registration
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
#Add repository
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt update
** Install nvidia-docker2 **
Check the installable package version
$ apt-cache madison nvidia-docker2
python
#Install the latest version (execute this time)
$ sudo apt -y install nvidia-docker2
#When installing by specifying the version, it looks like this
$ sudo apt -y install nvidia-docker2=2.0.3+docker18.09.7-3
#Older versions may require you to install other packages in advance. (Reference below)
$ sudo apt install nvidia-container-runtime-hook
$ sudo apt install nvidia-container-runtime=2.0.0+docker18.09.7-3
nvidia-docker2
is installed, nvidia-container-toolkit
also has a dependency and is installed together. (Nvidia-container-toolkit
is a newer package.)
If you install only the nvidia-container-toolkit
package, you can't use the --runtime = nvidia
or nvidia-docker
commands because Docker has native GPU support.** Reload docker daemon settings **
python
$ sudo pkill -SIGHUP dockerd
python
$ sudo nvidia-docker version
NVIDIA Docker: 2.5.0
Client: Docker Engine - Community
Version: 20.10.2
API version: 1.40
Go version: go1.13.15
Git commit: 2291f61
Built: Mon Dec 28 16:17:43 2020
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 19.03.14
・ ・ ・ ・
・ ・ ・ ・
Click here for details https://hub.docker.com/r/nvidia/cuda/
python
$ docker image pull nvidia/cuda:11.1-base-ubuntu20.04
Latest command (docker19.Supported since 03
--gpusWhen using the option)
#When using all GPUs
$ docker container run --gpus all --rm nvidia/cuda nvidia-smi
#If you want to specify the GPU to use
$ docker container run --gpus '"device=0,1"'--rm nvidia/cuda nvidia-smi
nvidia-smi
could not be executed because the options (environment variables?) Such as devices
were not set enough.Old command (docker19.Before 03+nvidia-If the docker2 package is included)
$ docker container run --runtime=nvidia --rm nvidia/cuda:11.1-base-ubuntu20.04 nvidia-smi
# or
$ nvidia-docker run --rm nvidia/cuda:11.1-base-ubuntu20.04 nvidia-smi
#If you want to specify the GPU to use
$ docker container run --runtime=nvidia NVIDIA_VISIBLE_DEVICES=0,1 --rm nvidia/cuda:11.1-base-ubuntu20.04 nvidia-smi
Recommended Posts