Precautions when making Docker for deep learning

Introduction

It seems that the operation of Web services is assumed as the main target for using Docker. Therefore, there were some subtle points that needed attention that were not written so much, so I will write them down because it is a good opportunity.

Cooperation with GPU

NVIDIA's GPU-related integration is not supported by native Docker and you need to include nvidia-docker. The installation was written by Mr. Sasaki of NVIDIA [What's happening with NVIDIA Docker now? ](Https://medium.com/nvidiajapan/nvidia-docker-%E3%81%A3%E3%81%A6%E4%BB%8A%E3%81%A9%E3%81%86%E3%81 % AA% E3% 81% A3% E3% 81% A6% E3% 82% 8B% E3% 81% AE-20-09-% E7% 89% 88-558fae883f44 ) Seems to be able to follow the latest information. For the latest information, it is OK if you include nvidia-docker2.

image

Basically, use the official image of pytorch/tensorflow.

python versioning problem

If the official image isn't enough for reproducibility, you'll need to build python. I remember having a hard time, so I thought I'd make a note of it, but recently an easy-to-understand page has been created. It seems easy to build with pyenv. https://www.python.jp/install/build_python_unix.html

About the result of nvidia-smi in Docker

Using nvidia-smi will show the CUDA version, but this seems to have nothing to do with the CUDA version enabled in Docker, just the CUDA version supported by the driver. , It doesn't make sense. Be aware that one of the motivations for using nvidia-docker is that you want to use the old CUDA library, which can be confusing. If you want to see the version, it seems that it is common to check nvcc -V or the installed path.

https://stackoverflow.com/questions/53422407/different-cuda-versions-shown-by-nvcc-and-nvidia-smi

About building with CUDA

By default, you cannot use CUDA etc. during docker build. This is fine if you are using pytorch/tensorflow etc., but it is inconvenient if you want to build your own library. This can be solved by tweaking the default runtime in /etc/docker/daemon.json. https://stackoverflow.com/questions/59691207/docker-build-with-nvidia-runtime

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

Use docker-compose

To be honest, compose may not be needed for deep learning. Compose is made for situations where you have to build multiple containers, and if you just build a container to learn properly, there is a feeling of over-specification, and compatibility with the nvidia runtime is not good. To use compose

By the way, compose has a fairly unique format for yaml files, and I remember being confused when I first used it. If you have never worked with yaml files, it is a good idea to check Syntax once.

Since you can basically do what you can do with compose from the command line, I will write it based on the fact that there may be situations where it is better to create an appropriate shell script.

    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=all

If you want to use only some GPUs out of multiple US GPUs

    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=0,1
      - NVIDIA_DRIVER_CAPABILITIES=all

Write like this.

Volume mount

How to mount the volume

If you want to use it for learning, you want to output the result out of the container. You may also want to mount the dataset. You need to mount the Volume for this. There are two ways to mount Docker volume, one is the -v/-volume option and the other is the --mount option. The --mount option is upward compatible with the -v option. I think that docker-compose is easier to read, so this is recommended. The mount option is briefly described below. mount has three options: bind/volume / mount. bind Perhaps bind is convenient to use easily when using it for machine learning etc. bind allows you to mount the src path directly on Docker. You can both mount the dataset and mount the result folder. mount Since the written data of volume is dedicated to Docker, it is troublesome when using it from the outside. It is also unsuitable for tasks such as referencing data outside the container. However, it seems to be a more recommended option when you want to share data between Docker. tmpfs tmpfs creates a temporary directory in memory and cannot be used for storage.

Read the official documentation for more details. These three pages are very similar and very difficult to distinguish. .. ..

Owner problem

However, when you log in to the container with Docker, you basically log in as root. If you generate a file in that state, the authority of the file becomes root, which is very inconvenient when you want to read and write data from outside the container. Therefore, I would like to create a user in the Dockerfile and log in, but there is a concern that reusability will decrease if the login user name is written in the Dockerfile.

File owner problem when mounting volume with docker was helpful for this problem. The easiest way is to mount / etc/group and/etc/passwd. When I hear / etc/passwd, it seems that some password related files are mounted, which is suspicious, but in recent distributions, the password is often not written in/etc/passwd, so it is a particular problem. There is no. It is a file that must be used when logging in, so it can be read by all users. -v /etc/group:/etc/group:ro -v /etc/passwd:/etc/passwd:ro

The question of what to develop

VSCode gives you direct access to containers running on remote machines (eg servers for computation). However, since the program is also transferred to the home directory of the container and executed remotely, if you try to log in with user privileges after launching it as the root once, the update will fail and you will not be able to log in. So be careful. Connect to docker container of ssh connection destination with Remote Development of VS Code Something will be helpful

Other

Mujoco cannot be used due to the license, so it is difficult to use for deep reinforcement learning. Since Docker is loosely used for machine learning, I may have written some bad habits, so I would like to point out.

reference

Recommended Posts

Precautions when making Docker for deep learning
For JAVA learning (2018-03-16-01)
Fastest PC setup for deep learning from scratch
How to build Docker + Springboot app (for basic learning)
When Docker for Mac 2.4.0.0 does not reflect file changes
Learning memo when learning Java for the first time (personal learning memo)
Ruby Learning # 23 For Loops
Measures for permissions when building MySQL with Docker on WSL2