Build a Python machine learning environment in a container on your laptop.
This time, we will build a container that can use fastText and Python3 mainly as an environment for text-based machine learning.
--For execution (learning and verification), log in to the machine learning container with a shell and execute fastText and Python3. --Mount the source code and learning files on the folder on your PC. --The editor uses Visual Studio Code.
https://qiita.com/penpenta/items/3b7a0f1e27bbab56a95f
First, start the base container image. It is better to keep the execution command here the same as the command to execute the built container image. This time we will build fastText inside the container, so we will use the CentOS image.
docker run -it -v /c/temp/data:/data --rm centos:centos8 /bin/bash
You can build with Dockerfile from the beginning, but you may get an installation error. This work is done to gradually verify the environment.
dnf -y install python36
yum install -y git make gcc gcc-c++
yum install -y python36-devel
cd /usr/local/src
git clone https://github.com/facebookresearch/fastText.git
cd fastText
pip3 install .
python3
import fasttext
exit()
docker export {Image ID} > {file name}.tar
Install the standard package.
pip3 install numpy pandas matplotlib scikit-learn
Install the packages required for Japanese processing. The dictionary is not required, so you can skip it.
rpm -ivh http://packages.groonga.org/centos/groonga-release-1.1.0-1.noarch.rpm
yum -y makecache
yum -y install mecab mecab-ipadic
yum -y install --nogpgcheck mecab-devel
pip3 install mecab-python3 neologdn emoji
yum install -y diffutils patch which file openssl
cd /usr/local/src
git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git
cd mecab-ipadic-neologd
./bin/install-mecab-ipadic-neologd -n -y
This time I want to use it for personal use and always use the new package version, so I will export the environment to the work Dockerfile so far.
--Points when creating a Dockerfile --Use WORKDIR instead of cd (cd cannot be used in Dockerfile) --The command should be executed by bash. --The dictionary is commented out. (Because it takes time to build)
Dockerfile
FROM centos:centos8
SHELL ["/bin/bash", "-c"]
#Python Install
RUN dnf -y install python36
#fastText Build
RUN yum install -y git make gcc gcc-c++
RUN yum install -y python36-devel
WORKDIR /usr/local/src
RUN git clone https://github.com/facebookresearch/fastText.git
WORKDIR /usr/local/src/fastText
RUN pip3 install .
#Install Python Package
RUN pip3 install numpy pandas matplotlib scikit-learn
RUN rpm -ivh http://packages.groonga.org/centos/groonga-release-1.1.0-1.noarch.rpm
RUN yum -y makecache
RUN yum -y install --nogpgcheck mecab mecab-ipadic mecab-devel
RUN pip3 install mecab-python3 neologdn emoji
#Install Mecab Dictionary
# RUN yum install -y diffutils patch which file openssl
#
# WORKDIR /usr/local/src
# RUN git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git
# WORKDIR /usr/local/src/mecab-ipadic-neologd
# RUN ./bin/install-mecab-ipadic-neologd -n -y
docker build -t fasttext/centos8:v1.0 .
docker run -it -v /c/temp/data:/data --rm fasttext/centos8:v1.0 /bin/bash
By following the steps below, you have built a Python machine learning environment in a container on your laptop.
With remote work, it was not possible to always connect to the in-house VPN, and development work on the development server was difficult, but this environment solved it.
Since it takes time to learn on a notebook PC, there are problems such as "resources are occupied for learning" and "cannot shut down", but if the same container is loaded in advance on the development server, "less on a notebook PC" You can solve this problem by following the procedure "Check operation with epoch → Increase epoch on development server and learn". It is also possible to hook the commit of the source code and automatically run the learning on the development server. I will summarize the article separately about that.
Recommended Posts