Build a Python execution environment using GPU with GCP Compute engine

Background and what you want to do

In my graduation research, I needed to execute the code written in tensorflow. The program consumes a lot of memory, and the computer in the laboratory screamed and didn't work, so I cried GCP I had to build an execution environment using = angelic-turbine-257310 & hl = ja) ... uh ....

Moreover, there was a suspicion (?!) That I would not be in time for my graduation without using GCP (it is difficult). I built a Python execution environment so that GCP can also be used ...

At first, I didn't know how to do it at all, and spent days crying and squirming (not to mention that my research wasn't progressing well: innocent :)! !!

For those who are having trouble with similar things, I would like to start with the idea that even a little ...!

Environmental setting

This time, we will build the environment with the following settings.

Ubuntu 16.04 LTS
GCE
CUDA 9.0
cuNN 7.4
TensolFlowGPU 1.12.0

It seems that TensolFlow needs to match the version with CUDA depending on the version ...: neutral_face: If this version is out of sync, GCP may not be recognized properly, so We recommend that you check from the following. (By the way, I made a few mistakes, yes.)

TensolFlow GPU Build Settings

Environment construction procedure

0. Register with GCP

--Account registration with CCP --Enable VM Instance billing --Create VM instance

vCPU x 24 --Memory 90 GB
GPU : NVIDIA Tesla T4
Boot disk : Ubuntu 16.04 LTS (Size 250GB)
ZONE : us-west1-b --Allow HTTP traffic

Reference site: How to do deep learning (NVIDIA DIGITS) using GCP GPU (NVIDIA Tesla K80) for free

1. Install CUDA and NVIDIA drivers

Execute the following command on the VM instance to install CUDA and driver

$ curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
$ sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb

$ sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub

$ sudo apt-get update
$ sudo apt-get install cuda-9-0

In addition, run the following command to optimize GPU performance.

$ sudo nvidia-smi -pm 1

2. Install cuDNN 7.0

Create a developer account with From here and Download the following three files of cuDNN. / cudnn-download).

version : ubuntu 16.04 cuda-9.0 version

libcudnn7_7.6.4.38-1+cuda9.0_amd64.deb
libcudnn7-dev_7.6.4.38-1+cuda9.0_amd64.deb
libcudnn7-doc_7.6.4.38-1+cuda9.0_amd64.deb

Once the download is complete, upload the three files to Starage. Here, the bucket name is cuda_9 (change it to your liking!).

When the upload is complete, use the gsutil command to transfer it to the instance as it is. Please choose the directory to upload.

$ cd {UP_LOAD_PATH}
$ gsutil cp gs://cuda_9/libcudnn7_7.6.4.38-1+cuda9.0_amd64.deb .
$ gsutil cp gs://cuda_9/libcudnn7-dev_7.6.4.38-1+cuda9.0_amd64.deb .
$ gsutil cp gs://cuda_9/libcudnn7-doc_7.6.4.38-1+cuda9.0_amd64.deb .

After the transfer is complete, unzip the file and install it.

$ sudo dpkg -i *.deb

3. Swap file settings

If you do not have a swap file, you may leak memory when you run the program. When I create a Linux virtual machine with GCE, whether it's mUbuntu or CentOS, the virtual machine is created without a swap file ... it seems ... (I didn't know that even a millimeter, so I got stuck here)

So, first check the existence of swap with the free command.

$ free -m

If it looks like the following, Swap: is zero, so you need to create a swap file.

               total        used        free      shared  buff/cache   available
Mem:            581         148          90           0         342         336
Swap:             0           0           0

Creating a swap file. The capacity of the swap file is your choice (10G this time)

$ sudo fallocate -l 10G /swapfile
$ sudo chmod 600 /swapfile
$ sudo mkswap /swapfile
$ sudo swapon /swapfile

Check swap file

$ free -m

               total        used        free      shared  buff/cache   available
Mem:            581         148          88           0         344         336
Swap:          1023           0        10023

** Tips **: To automatically mount the swap file on reboot, it seems that you should add the following to / etc / fstab.

/swapfile none swap sw 0 0

4. GPU recognition confirmation and CUDA settings

We will set up CUDA.

$ echo "export PATH=/usr/local/cuda-9.0/bin\${PATH:+:\${PATH}}" >> ~/.bashrc
$ source ~/.bashrc
$ sudo /usr/bin/nvidia-persistenced

Then check if the GPU is recognized.

$ nvidia-smi

If you get the following response, GPU setting is complete!

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.72       Driver Version: 410.72       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   42C    P0    65W / 149W |      0MiB / 11441MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

5. Building a Python environment

Finally, we will build a Python environment with Anadonda. (I usually use Anaconda, and the program didn't work with other methods, so I chose Anaconda this time.)

Download Anaconda with wget.

$ wget https://repo.anaconda.com/archive/Anaconda3-5.3.1-Linux-x86_64.sh
$ sh ./Anaconda3-5.3.1-Linux-x86_64.sh
$ echo ". /home/{USER_NAME}/anaconda3/etc/profile.d/conda.sh" >> ~/.bashrc
$ source ~/.bashrc

Next, build the Anaonda disguise environment. Python version and `ʻENV_NAMEare your choice. (This time I want to use tensorflow == 1.12.0, so Python3.6.5``)

$ conda create -n {ENV_NAME} python=3.6.5
$ conda activate {ENV_NAME}

6. Install tensorflow-gpu

install from conda (I feel relieved when I come here ...)

$ conda install tensorflow-gpu==1.12.0

Execute the following program, and when GPU appears, tensorflow-gpu recognizes the GPU.

`test.py`


from tensorflow.python.client import device_lib
device_lib.list_local_devices()

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 2319180638018740093
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 11324325888
locality {
  bus_id: 1
}
incarnation: 13854674477442207273
physical_device_desc: "device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7"
]

In addition, if there is a necessary library, install it from conda. (↓ like this)

$ conda install numpy==1.15.4
$ conda install scipy==1.1.0 
$ conda install scikit-learn==0.20.0

You should now be able to run your Python program using GCP on an instance of the compute engine ...! Thank you for your hard work...!!

If you have any mistakes, please leave a comment: bow_tone2: