We have built a Docker environment that can use PyTorch, which is becoming popular as a framework for deep learning, and Jupyter Lab (successor to Jupyter Notebook), which is popular when using Python for data analysis. We have created a new environment, so we will revise the article (2019.12.14)
I referred to this article. I used to install the graphics card driver and CUDA, cudnn directly on my Linux machine, but I struggled because it didn't work well if the combination of the Deep Learning framework and each version was different. I feel that it has become much easier than that.
Register the driver repository with apt.
$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt update
Install the recommended driver.
$ sudo apt -y install ubuntu-drivers-common
$ sudo ubuntu-drivers autoinstall
Install the NVIDIA Container Toolkit, which includes the runtime required to use NVIDIA GPUs with Docker. First, register the repository with apt.
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$(. /etc/os-release;echo $ID$VERSION_ID)/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt update
Then install the toolkit.
$ sudo apt -y install nvidia-container-toolkit
Reboot the machine once.
$ sudo shutdown -r now
After that, you can check if the GPU is recognized by the command below.
$ nvidia-container-cli info
Clone Jupter's GitHub to get the base Docker file.
$ git clone https://github.com/jupyter/docker-stacks.git
--File to use - base-notebook/Dockerfile
base-Change the base when building Dockerfile to be NVIDIA's Docker. The # line is commented out and disabled in the original description, and the subsequent lines are enabled. I opened the base-notebook / Dockerfile with a text editor and changed the description at the beginning as follows. Please refer to NVIDIA's Docker Hub page and select the version that suits your Deep Learning framework.
#ARG BASE_CONTAINER=ubuntu:bionic-20191029@sha256:6e9f67fa63b0323e9a1e587fd71c561ba48a034504fb804fd26fd8800039835d
#FROM $BASE_CONTAINER
FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04
Create a Docker Image in the base-notebook directory with a command like the one below. You can freely name the Docker Image after -t.
$ docker image build ./ -t experiments/base-notebook
Display the Docker Image with the following command and check if it was created.
$ docker images
Clone the official PyTorch GitHub with the following command in the directory you want to save.
$ git clone https://github.com/pytorch/pytorch.git
Copy docker / pytorch / Dockerfile as docker / pytorch-notebook / Dockerfile and make any necessary changes. Open / pytorch-notebook / Dockerfile with a text editor and change the beginning as follows so that it is based on the Docker Image of Jupyter Lab which is the base created in the previous step.
#FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04
FROM experiments/base-notebook:latest
There is a place to install miniconda (lightweight version of Anaconda) before installing PyTorch, Since it is installed by Docker of Jupyer Lab, it is disabled by commenting out, and it is executed from the place where other libraries and pytorch are installed. Prefix the line you want to enable with RUN. It is added to install the following packages by executing the tutorial program of PyTorch.
# Install PyTorch
#RUN curl -o ~/miniconda.sh -O https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
# chmod +x ~/miniconda.sh && \
# ~/miniconda.sh -b -p /opt/conda && \
# rm ~/miniconda.sh && \
RUN /opt/conda/bin/conda install -y python=$PYTHON_VERSION numpy pyyaml scipy ipython mkl mkl-include ninja cython typing \
ipykernel pandas matplotlib scikit-learn pillow seaborn tqdm openpyxl ipywidgets && \
/opt/conda/bin/conda install -y -c pytorch magma-cuda100 && \
/opt/conda/bin/conda install -y -c conda-forge opencv pyside2 && \
/opt/conda/bin/conda clean -ya
ENV PATH /opt/conda/bin:$PATH
Postscript: I got the following error when importing opencv. "ImportError: libGL.so.1: cannot open shared object file: No such file or directory" I added "libgl1-mesa-dev" where I did apt-get install. (Refer to this article)
I commented out the description below at the end to match the Docker user environment of JupyterLab. WORKDIR /workspace RUN chmod -R a+w . Instead, I added the description below.
RUN chown -R $NB_UID:$NB_GID /home/$NB_USER
WORKDIR /home/$NB_USER
# Switch back to jovyan to avoid accidental container runs as root
USER $NB_UID
RUN echo 'export PATH=/opt/conda/bin:$PATH'>> ~/.bashrc
PyTorch's "root directory" </ font> cloned from GitHub (please note that this is quite easy to make a mistake. It was decided to update the submodule from GitHub, cmake, etc. (Must be in position), build a Docker Image with a command like the one below. In this example, the name of the Docker Image to be created is output as "experiments / pytorch-notebook".
$ docker build -t experiments/pytorch-notebook -f docker/pytorch-notebook/Dockerfile .
Note that the cmake process for caffe2 takes a lot of time.
Create a container from the created Docker Image and execute it. Set the password for first accessing Jupyter Lab with a browser. I referred to this article.
docker run \
--rm -it \
--user root \
--name pytorch-notebook \
experiments/pytorch-notebook:latest \
/bin/bash -c \
"python -c 'from notebook.auth import passwd;print(passwd())'"
You will be prompted to enter the password, so enter it twice. The hashed password value (sha1: xxxxxxxxxxxxxxxxxxxxxxxx) will be output, so record it.
Enter password:
Verify password:
sha1:xxxxxxxxxxxxxxxxxxxxxxxx
Start Jupyter Lab with a hashed password (specified in --NotebookApp.password =).
docker run \
--rm \
--user root -e NB_UID=$UID \
-p 58888:8888 -p 50022:22 -p 56006:6006 \
-v ~/:/home/jovyan/work \
--name pytorch-notebook \
--gpus all \
--ipc=host \
experiments/pytorch-notebook:latest \
start.sh jupyter lab --NotebookApp.password="sha1:xxxxxxxxxxxxxxxxxxxxxxxx"
You can use Jupyter Lab by accessing localhost: 58888 (when port numbers are mapped as in the above example) with a web browser.
When using GPU with PyTorch, it seems that you need to allocate memory with options like --ipc = host or --shm-size = 16G. If you set num_workers to 1 or more in DataLoader when creating a mini-batch and use multi-process, it seems that it is caused by data exchange using shared memory. [Reference article of Qiita](https://qiita.com/sakaia/items/671c843966133cd8e63c#docker%E3%81%A7%E3%81%AEdataloader%E5%88%A9%E7%94%A8%E3%81 % AE% E6% B3% A8% E6% 84% 8F)
If you want to run a python file, use% run.
%run -i sample.py
References [1] PyTorch GitHub [2] Jupyte Lab Dockerfile [3] Using GPU in Docker container with NVIDIA Container Toolkit [4] Building an environment for Jupyter Lab with Docker
Recommended Posts