This is a memorandum article for when you forget to use Docker because you have moved the learning environment of machine learning to Docker.
Since it is a basic Mac premise, commands etc. for Windows are not written.
How to build a Docker image with a machine learning library installed and play with machine learning using the container Jupyter-notebook from the outside.
Since Python is used, I put in a library related to Python machine learning.
Created based on Ubuntu 20.04. I'm basically just installing the Python package with pip, so I haven't done anything special.
FROM ubuntu:20.04
LABEL maintainer="kuboshu83"
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && \
apt-get -y upgrade && \
apt-get install -y git \
make \
cmake \
gcc \
g++ \
wget \
zip \
curl && \
# ~~~~~Python installation~~~~~
apt-get install -y python3 python3-pip && \
ln -s $(which python3) $(dirname $(which python3))/python && \
ln -s $(which pip3) $(dirname $(which python3))/pip && \
# ~~~~~Installation of ML related libraries for Python~~~~~
#Tensorflow and Pytorch are large, so comment them out if you don't need them.
#Estimated capacity is tensorflow=1.2GB, pytorch=It is 2GB.
# Tensorflow,It is about 2GB for ML-like libraries other than Pytorch.
pip install pystache \
numpy==1.18.5 \
pandas \
scikit-learn \
matplotlib \
jupyterlab \
pycaret \
lightgbm \
alembic==1.4.1 \
sqlalchemy==1.3.13 \
optuna && \
pip install tensorflow && \
pip install torch torchvision && \
# ~~~~~OpenCV installation~~~~~
pip install opencv-python && \
apt-get install -y libgl1-mesa-dev && \
# ~~~~Install Tesseract~~~~~
apt-get install -y libleptonica-dev tesseract-ocr && \
# ~~~~Install PyOCR~~~~~
pip install pyocr && \
mkdir -p /usr/local/share/tessdata/ && \
curl -sS -L -o /usr/share/tesseract-ocr/4.00/tessdata/jpn.traineddata && \
# ~~~~Install MeCab~~~~
apt-get install -y mecab libmecab-dev mecab-ipadic && \
pip install --no-binary :all: mecab-python3 && \
pip install neologdn && \
#~~~~Creating a working directory~~~~
mkdir -p /home/share
#Launch Python shell by default
CMD ["python"]
You can build the Docker image with the following command. Also, since the same command is described in on Github, you can also build the image by executing
docker build -t image name:Location of version Dockerfile
If you create share / in the current directory and execute the following command, Jupyter-notebook will start next to the container. After that, you can use Jupyter-notebook by opening the displayed URL with a browser. Please specify an appropriate version of the Docker image. In the example below, v0.1.0 is used.
#Of the container/home/Create a directory to share with share
> mkdir share
#Start the container.
# -rm:Delete the container at the same time as stopping the container.
# -it:Required to use the terminal in the container.
#Jupyter like this time-It's unnecessary if you just use a notebook, but somehow it's included.
# -p :Assign host port 8888 to container port 8888.
# -v :host's(Current directory)/share/The container/home/share/Mount on.
# -w :The current directory of the container when the container is started/home/share/To.
# Jupyter-The lab is running on port 8888.
> docker run --rm -it -p 8888:8888 -w /home/share -v $(pwd)/share:/home/share pythonml:v0.1.0 /usr/local/bin/jupyter lab --ip= --port 8888 --allow-root
When you start the container, / home / share / prepared for work becomes the current directory, so it is easy to use if you share this with the directory on the host side.
I want to build the Docker image completely automatically, so I don't want to be asked to set it manually when installing packages, so I wanted to disable the interactive setting at the time of installation, so I set the following as environment variables.
ENV DEBIAN_FRONTEND noninteractive
At first, I used the RUN instruction a lot without thinking about anything, but when I checked the image with docker image ls -a, the image of the intermediate layer was mass-produced as shown below. Apparently, Docker creates an intermediate layer each time you use an instruction in a Dockerfile, and finally synthesizes the intermediate layers to create the final image. Therefore, we have reduced the number of instructions used as much as possible.
I don't know yet because I don't understand if there is a problem with many middle layers. However, when I displayed the image list, I felt uncomfortable that there were a lot of \ <none >, so I wrote it to reduce the middle layer.
pythonml v0.1.0 xxxxxx
<none> <none> xxxxxx <=Like this
<none> <none> xxxxxx <=Like this
<none> <none> xxxxxx <=Like this
<none> <none> xxxxxx <=Like this
<none> <none> xxxxxx <=Like this
<none> <none> xxxxxx <=Like this
<none> <none> xxxxxx <=This too
ubuntu 20.04 xxxxxx
This time, I just built a Docker image with a Python package installed, and made a note of how to build an environment to play with machine learning using Jupyter-notebook from the host. I still have some libraries I want to play with, so I'd like to add them in the future.
Also, this time I give priority to the appearance of the Dockerfile, and since all the libraries are put in with apt-get or pip, there are older versions. So, I would like to build from the source code and install the latest version if I have time.
