The story of making the existing Dockerfile GPU compatible

Introduction

In this article, the Dockerfile and docker-compose files that I wanted to make machine learning did not support GPU, so I will talk about the cause and countermeasures. However, please note that I have only been researching docker for about a month and may have some incorrect knowledge. M (_ _) m

environment

Ubuntu 18.04
docker 19.03.11
docker-compose 1.17.1 -> 1.20.1 --nvidia-docker, cuda, etc. are already installed on the host OS
tensorflow 2.1.0

Target person

--People who want to make docker GPU compatible --People who want to know if they can support GPU

Dockerfile and docker-compose.yml used

This time I tried to use a sample project of a certain book, so I can not put all of it, but I will describe only the minimum part.

`Dockerfile`


FROM continuumio/miniconda3:latest

COPY environment.yml /tmp/
RUN conda update -y -n base conda \
    && conda env create -f /tmp/environment.yml \
    && conda clean -y -t \
    && rm /tmp/environment.yml

ENV PATH /opt/conda/envs/tf/bin:$PATH

`docker-compose.yml`


version: "3"
services:
  tf:
    build:
      context: ../
      dockerfile: ./docker/Dockerfile
    container_name: tf
    image: tf
    ports:
      - "8888:8888"
    volumes:
      - ../:/tf
    command: /opt/conda/envs/tf/bin/jupyter notebook --ip='0.0.0.0' --port=8888 --no-browser

What I want to do is create a python environment from environment.yml using miniconda and launch jupyter notebook with docker-compose. environment.yml contains tensorflow2.1, jupyter, etc.

However, it does not recognize the GPU as it is.

How to verify if the GPU is recognized

There are several ways to use the GPU, so I'll summarize them here. By running these inside the docker container, you can see if you can use the GPU. nvidia-smi This is a method to check if the GPU can be recognized as a physical device. If you can use GPU from docker, it will basically accept commands. tf.config.list_physical_devices('GPU') This is the one that runs in python. Please use after importing tensorflow. Doing this will return a list of GPUs available. At this time, if it cannot be used, various warnings will appear, so you can use this as a clue to grasp the cause. tf.test.is_gpu_available() This is also a python function. This is the same as the above function except that it returns True / False. ※ lspci | grep -i nvidia This method often comes up when I check it, but I could not use this command in docker. Is it because of the minimum configuration ...?

Why GPU cannot be used

When I tried the above command for the Docker container I wanted to use, all the results were not recognizing the GPU. As a result of a lot of research, I found that there are three main reasons.

--docker-compose is not fully compatible with GPU --CUDA and cudnn are not included in the docker container --tensorflow does not recognize CUDA etc.

Let's take a closer look one by one.

GPU support for docker-compose

I will talk on the assumption that nvidia-docker (nvidia-container-toolkit) is already installed on the host OS. docker supports the gpus option from 19.03, and by setting --gpus all, it will physically recognize the GPU device and nvidia-smi can be used. However, docker-compose does not support gpus tags. Therefore, use the runtime tag to make it GPU compatible.

There are many ways to support gpu with docker-compose, such as this article, if you search for "docker-compose GPU", but how many ways I think it's difficult because it's different. The main points to check are as follows.

--Set docker-compose version to 1.19 or higher --The runtime tag will be available --Check the version tag of docker-compose.yml --It may not be available depending on the version of docker-compose. The latest version may be manageable. --Check the contents of /etc/docker/daemon.json --No problem if nvidia related settings are written --Write runtime: nvidia in docker-compose --Write NVIDIA_VISIBLE_DEVICES = all or NVIDIA_DRIVER_CAPABILITIES = all in the environment tag --NVIDIA_VISIBLE_DEVICES specifies the GPU device to use, and NVIDIA_DRIVER_CAPABILITIES specifies how to use the GPU (?). There may be no problem with the minimum specifications such as compute and utility.

After making these checks and changes, if you use nvidia-smi in the container launched by docker-compose, hopefully you can use nvidia-smi.

Support for CUDA and cudnn

It was very difficult to investigate the problem from here. GPU support with docker and docker-compose only recognizes the GPU as a physical device, it does not mean that GPU calculation will be possible. Therefore, CUDA and cudnn must be installed in the docker container. Most of the samples use images such as nvidia / cuda, tensorflow images, etc., but these containers contain CUDA. However, I think most images like miniconda don't support GPU.

To be honest, I think that this countermeasure depends on the Docker image that is the base. In my environment, I stopped using miniconda, based on nvidia / cuda, and installed miniconda with RUN. If the base image has a tag or related image that contains cuda or cudnn, I think you should select it from docker hub. If you can't base it on nvidia / cuda and you don't have a version with cuda, I think the only way is to add cuda or cudnn in your Dockerfile. I don't know how to write it in the Dockerfile, so ask someone who knows it ... If you do it honestly, there is a possibility that the cache etc. will remain and it will not be lightweight.

CUDA recognition of tensorflow

I found a version of CUDA or cudnn and it's bad, but tensorflow needs to have the corresponding CUDA or cudnn installed. It may not work if the version is slightly different. Make sure you have a Docker image that contains the corresponding CUDA version. (Reference: Tensorflow build configuration)

Furthermore, from here, in some cases, tensorflow may not find it even though you have entered it. This may be because the environment variable LD_LIBRARY_PATH does not contain the path to the cuda related files. You can find files such as libcudart and libcurand in all files by searching for find / -name libcu *. Add the folder containing them to LD_LIBRARY_PATH.

If tensorflow recognizes the GPU in the same way as before, it's successful! Congratulations.

Bonus: About tensorRT

As is the case in my environment, some warnings appear when I import tensorflow. It states that tensorRT cannot be used there. This does not mean that tensorflow cannot be used, but that it does not include tensorRT, which makes GPU calculations even faster. So, even if it comes out, I think that it will recognize the GPU without any problem.

in conclusion

It may not have been helpful because I just wrote my experience in a hurry ... In the first place, if you base it on the docker image of tensorflow-gpu from the beginning, such a problem is unlikely to occur, and it will recognize the GPU without attaching the gpu tag, so it is recommended to start with such an image if possible. To do.