About this article

This is a memorandum article for when you forget to use Docker because you have moved the learning environment of machine learning to Docker.

environment

Since it is a basic Mac premise, commands etc. for Windows are not written.

MacOS Mojave

GitHub The Dockerfile introduced in this article is published on GitHub. GitHub:/kuboshu/pythonml

What you can see in this article

How to build a Docker image with a machine learning library installed and play with machine learning using the container Jupyter-notebook from the outside.

Libraries installed in the Docker image

Since Python is used, I put in a library related to Python machine learning.

python 3.8.2
scikit-learn 0.23.2
pandas 1.1.2
numpy 1.18.5
jupyterlab 2.2.8
pycaret 2.1.2
lightgbm 3.0.0
xgboost 1.2.0
scipyt 1.5.2
matplotlib 3.3.2
tensorflow 2.3.1
pytorch 1.6.0
pyocr 0.7.2
opencv-python 4.4.0.44
optuna 2.1.0
mecab

Contents of the created Dockerfile

Created based on Ubuntu 20.04. I'm basically just installing the Python package with pip, so I haven't done anything special.

FROM ubuntu:20.04
LABEL maintainer="kuboshu83"
ENV DEBIAN_FRONTEND noninteractive
ARG INSTALLDIR_PYOCR="/app/ocr"
RUN apt-get update && \
    apt-get -y upgrade && \
    apt-get install -y git \
                       make \
                       cmake \
                       gcc \
                       g++ \
                       wget \
                       zip \
                       curl && \
    # ~~~~~Python installation~~~~~
    apt-get install -y python3 python3-pip && \
    ln -s $(which python3) $(dirname $(which python3))/python  && \
    ln -s $(which pip3) $(dirname $(which python3))/pip && \
    # ~~~~~Installation of ML related libraries for Python~~~~~
    #Tensorflow and Pytorch are large, so comment them out if you don't need them.
    #Estimated capacity is tensorflow=1.2GB, pytorch=It is 2GB.
    # Tensorflow,It is about 2GB for ML-like libraries other than Pytorch.
    pip install pystache \
                numpy==1.18.5 \
                pandas \
                scikit-learn \
                matplotlib \
                jupyterlab \
                pycaret \
                lightgbm \ 
                alembic==1.4.1 \ 
                sqlalchemy==1.3.13 \
                optuna && \
    pip install tensorflow && \
    pip install torch torchvision && \
    # ~~~~~OpenCV installation~~~~~
    pip install opencv-python && \
    apt-get install -y libgl1-mesa-dev && \
    # ~~~~Install Tesseract~~~~~
    apt-get install -y libleptonica-dev tesseract-ocr && \
    # ~~~~Install PyOCR~~~~~
    pip install pyocr && \
    mkdir -p /usr/local/share/tessdata/ && \
    curl https://raw.githubusercontent.com/tesseract-ocr/tessdata_best/master/jpn.traineddata -sS -L -o /usr/share/tesseract-ocr/4.00/tessdata/jpn.traineddata && \
    # ~~~~Install MeCab~~~~
    apt-get install -y mecab libmecab-dev mecab-ipadic && \
    pip install --no-binary :all: mecab-python3 && \
    pip install neologdn && \
    #~~~~Creating a working directory~~~~
    mkdir -p /home/share

#Launch Python shell by default
CMD ["python"]

How to build a Docker image

You can build the Docker image with the following command. Also, since the same command is described in build.sh on Github, you can also build the image by executing build.sh.

docker build -t image name:Location of version Dockerfile

How to launch Jupyter-notebook

If you create share / in the current directory and execute the following command, Jupyter-notebook will start next to the container. After that, you can use Jupyter-notebook by opening the displayed URL with a browser. Please specify an appropriate version of the Docker image. In the example below, v0.1.0 is used.

#Of the container/home/Create a directory to share with share
> mkdir share

#Start the container.
# -rm:Delete the container at the same time as stopping the container.
# -it:Required to use the terminal in the container.
#Jupyter like this time-It's unnecessary if you just use a notebook, but somehow it's included.
# -p :Assign host port 8888 to container port 8888.
# -v :host's(Current directory)/share/The container/home/share/Mount on.
# -w :The current directory of the container when the container is started/home/share/To.
# Jupyter-The lab is running on port 8888.
> docker run --rm -it -p 8888:8888 -w /home/share -v $(pwd)/share:/home/share pythonml:v0.1.0 /usr/local/bin/jupyter lab --ip=0.0.0.0 --port 8888 --allow-root

When you start the container, / home / share / prepared for work becomes the current directory, so it is easy to use if you share this with the directory on the host side.

What I looked up when writing a Dockerfile

Avoid interactive installation

--For reference (DEBIAN_FRONTEND = noninteractive: qiita @ udzura)

I want to build the Docker image completely automatically, so I don't want to be asked to set it manually when installing packages, so I wanted to disable the interactive setting at the time of installation, so I set the following as environment variables.

ENV DEBIAN_FRONTEND noninteractive

[Reduce the number of times RUN is used]

--Reference (Tutorial aiming to understand Docker image: qiita @ zembutsu)

At first, I used the RUN instruction a lot without thinking about anything, but when I checked the image with docker image ls -a, the image of the intermediate layer was mass-produced as shown below. Apparently, Docker creates an intermediate layer each time you use an instruction in a Dockerfile, and finally synthesizes the intermediate layers to create the final image. Therefore, we have reduced the number of instructions used as much as possible.

I don't know yet because I don't understand if there is a problem with many middle layers. However, when I displayed the image list, I felt uncomfortable that there were a lot of \ <none >, so I wrote it to reduce the middle layer.

REPOSITORY          TAG       IMAGE ID      
pythonml            v0.1.0    xxxxxx        
<none>              <none>    xxxxxx        <=Like this
<none>              <none>    xxxxxx        <=Like this
<none>              <none>    xxxxxx        <=Like this
<none>              <none>    xxxxxx        <=Like this
<none>              <none>    xxxxxx        <=Like this
<none>              <none>    xxxxxx        <=Like this
<none>              <none>    xxxxxx        <=This too
ubuntu              20.04     xxxxxx

Summary

This time, I just built a Docker image with a Python package installed, and made a note of how to build an environment to play with machine learning using Jupyter-notebook from the host. I still have some libraries I want to play with, so I'd like to add them in the future.

Also, this time I give priority to the appearance of the Dockerfile, and since all the libraries are put in with apt-get or pip, there are older versions. So, I would like to build from the source code and install the latest version if I have time.

I was allowed to reference

-What is DEBIAN_FRONTEND = noninteractive: qiita @ udzura -Tutorial aiming to understand Docker image: qiita @ zembutsu

[Introduction to Docker] Create a Docker image for machine learning and use Jupyter notebook