Speaking of Docker container for data science, there is scipy-notebook distributed by jupyter official, but if you look at the Dockerfile, it is based on conda. It is written. But I don't want to use conda for religious reasons. So, this time, I will write a Dockerfile to create an environment for data science based on pip3.
Referenced articles A story about trying to create a machine learning environment using Docker
・ Based on python official Docker image
・ Use pip3
-Load only the required modules from requirements.txt
・ I want to connect with BigQuery with google-cloud-bigquery
, so insert the Cloud SDK
・ I want to visualize with jupyterlab + plotly, so insert Node.js
Dockerfile
#Python 3.Based on 8
#reference: https://qiita.com/penpenta/items/3b7a0f1e27bbab56a95f
FROM python:latest
USER root
RUN apt-get update \
&& apt-get upgrade -y \
&& apt-get install -y sudo \
&& apt-get install -y lsb-release \ # google-cloud-Required when installing sdk
&& pip3 install --upgrade pip
#Change working directory,You don't have to
# WORKDIR /home/{Appropriate user name}
#Requirements created in advance and in the same folder as the Dockerfile.Install txt
COPY requirements.txt ${PWD}
RUN pip3 install -r requirements.txt
#Install Cloud SDK
# https://cloud.google.com/sdk/docs/downloads-apt-get
RUN export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" && \
echo "deb http://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && \
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \
apt-get update -y && apt-get install google-cloud-sdk -y
#Node to use plotly.install js
# https://github.com/nodesource/distributions/blob/master/README.md
RUN curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash - \
&& sudo apt-get install -y nodejs
#Support plotly with Jupyter Lab
ENV NODE_OPTIONS=--max-old-space-size=4096
RUN jupyter labextension install @jupyter-widgets/[email protected] --no-build \
&& jupyter labextension install [email protected] --no-buil \
&& jupyter labextension install [email protected] --no-build \
&& jupyter lab build
ENV NODE_OPTIONS=
Write the modules to be installed this time in requirements.txt
.
Fixed version is your choice
requirements.txt
numpy
pandas
matplotlib
seaborn
scikit-learn
scrapy
jupyter
plotly
google-cloud-bigquery
jupyterlab
Put the above two files in an appropriate folder and move to that directory. After that, you can execute the following commands in order.
#Create Docker image
docker build --rm -t {Name of Docker image} .
#Create Docker Container
#Port forwarding to connect inside and outside of Docker(-p)To do
#Mount the folder outside Docker so that the file does not disappear even if you delete the Docker container(-v)To do
docker run -itp {Port outside the container}:{Port in the container} -v {Absolute path of the folder outside the container,At the end"/"Do not attach}:{Absolute path of where you want to mount the folder in the container+The name of the folder to mount in Docker} --name {The name of the container} {The name of the image from which it was created} /bin/bash
#Start jupyter
jupyter lab --ip=0.0.0.0 --allow-root --port {Port in the container}
#cloud SDK authentication
gcloud init
# google-cloud-bigquery API authentication
hogehoge
export GOOGLE_APPLICATION_CREDENTIALS={Absolute path of authentication file}
export GOOGLE_CLOUD_PROJECT={Project name to connect}
Recommended Posts