Pip install of C language dependent module with alpine is a heavy story
COPY
s the product solidly from the base image* .whl
and try to install it safelyFirst, get the required module counties This time I will introduce the following modules
requirements.txt
cycler==0.10.0
Cython==0.29.17
h5py==2.10.0
joblib==0.14.1
kiwisolver==1.2.0
matplotlib==3.2.1
numpy==1.18.4
pandas==1.0.3
Pillow==7.1.2
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2020.1
scikit-learn==0.22.2.post1
scipy==1.3.3
six==1.14.0
In the case of alpine, the compression format such as tar and zip drops for c language dependent modules. It is necessary to convert these to whl format.
whl should have the necessary libraries for conversion, so install it via apk
apk update \
&& apk add --virtual .build --no-cache openblas-dev lapack-dev freetype-dev
...
&& apk add --virtual .community_build --no-cache -X http://dl-cdn.alpinelinux.org/alpine/edge/community hdf5-dev
You can also download modules with pip download
,
Use the pip wheel
command because it will download and automatically extract the tar / zip file and build it.
Since pip wheel
can also use the -r
option, specify the versioning file with pip freeze> requirements.txt
etc.
pip wheel --no-cache --wheel-dir=./whl -r requirements.txt
--Option supplement
---- no-cache-dir
: Do not use / create cache. If not specified, it will be cached as ~ / .tmp
. Build-time products are also cached.
---- wheel-dir
: The output destination of the wheel file.
Unfortunately, in this case ** it fails on the way **
I'm using requirements.txt
built in another alpine environment and pip frozen.
Since numpy
and scipy
are not available in the environment, they fall during the scikit-learn
build.
With pip install -r requirements.txt
, the pip side will install it nicely, but [^ 1]
[^ 1]: Since the installation order of pip is executed all at once without considering dependent libraries and priorities, the same phenomenon occurs with pip install
. Instead, modules that fail in the middle due to "circular dependency" are avoided by running the build again as soon as all other modules are installed.
Only this time, there is no choice but to put the dependent module first. [^ 2]
[^ 2]: If scipy ~ = 1.4 in the environment at hand, an error will occur and it will fail, so specify the 1.3 series that entered obediently
pip install cython numpy==1.18.4 scipy==1.3.3
pip wheel --no-cache --wheel-dir=./whl -r requirements.txt
I was trying to create a separate image to avoid building numpy and scipy I feel like I'm doing something meaningless ...?
docker hub
after build is completeTag properly and push
docker tag 123456789a hoge/builder-image:latest
docker push hoge/builder-image:latest
From here, we will work on the dockerfile for the execution environment.
To specify multiple modules with pip install
, write solidly or specify a text file with --requirement
.
There is no specification that allows you to collect whl in a suitable directory and install it entirely.
This time, in the multi-stage build, COPY
the directory containing the wheel and execute the following command to install from the local wheel.
pip install --no-index --no-deps --no-cache-dir -f ./whl -r requirement.txt
--Option supplement
---- no-index
: Don't use index sites like PyPi. Use when you don't want to go online
---- no-deps
: Do not install dependent modules. However, it seems that this is not the case if it is clearly specified on the module side.
---f
, --find-links
: Specify the search destination of the module. Use this when you want to specify a local path
Modules such as pip and setuptools that you want to install with the --upgrade
option are installed separately in the upgrade text file.
The text file referenced by the -r
option can be installed without specifying the version.
upgrade.txt
pip
setuptools
wheel
Upgrade the modules in a specific directory with the following command
pip install -U --no-index --no-deps --no-cache-dir -f ./upgrade -r upgrade.txt
However, since the number of files to be managed will increase, it is better to write directly in the docker file unless you are in an offline environment.
Check if it can be imported. Create a shell file and hit the RUN
command directly.
import_test.sh
#!/bin/sh
python -c "import numpy"
python -c "import scipy"
python -c "import h5py"
python -c "import pandas"
python -c "import matplotlib"
python -c "import sklearn"
Delete extra files to reduce the weight of the docker image The image used to build whl only needs to have a product, so erase everything else.
builder-image
apk del --purge .build .testing_build
pip freeze | xargs pip uninstall -y
pip cache purge
Check how light the built image is by deleting the extra files. ** 360MB ** seems to have succeeded in weight loss
# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
naka345/wheel_build latest b6c9df898334 9 minutes ago 1.04GB
naka345/wheel_build latest 3236cf2f87de 2 days ago 639MB
Next is the arrangement on the execution environment side. Official python docker is very smart, so I will delete the file according to this. [^ 3]
[^ 3]: In the execution environment installed with -no-cache-dir
specified, when pip cache purge
is executed, the cache file is not found and an error code is returned. It's sober and difficult to use.
execution-image
#Bundle only the files required for the module as a new virtual package,
find /usr/local -type f -executable -not \( -name '*tkinter*' \) -exec scanelf --needed --nobanner --format '%n#p' '{}' ';' \
| tr ',' '\n' \
| sort -u \
| awk 'system("[ -e /usr/local/lib/" $1 " ]") == 0 { next } { print "so:" $1 }' \
| xargs -rt apk add --no-cache --virtual .module-rundeps && \
#Erase all packages used at build time
apk del --purge .build .community_build
#Delete extra files and garbage on the python side
find /usr/local -depth \
\( \
\( -type d -a \( -name test -o -name tests -o -name idle_test \) \) \
-o \
\( -type f -a \( -name '*.pyc' -o -name '*.pyo' \) \) \
\) -exec rm -rf '{}' +
#Cleaning of dust for the range of this execution
rm -rf /tmp/whl
Let's compare it with the time when it was not erased on the execution environment side.
# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
naka345/wheel_install latest f0df8a9887de 3 hours ago 1.29GB
↓
naka345/wheel_install latest 27b4805053f2 3 hours ago 968MB
I managed to keep it below 1GB.
Based on the above, write it down in the docker file. Since it will be long, I have pasted the github link.
Modules that take time can now be safely and relatively quickly brought in via pip. The docker image has also been made slightly lighter.
However, the part that must have multiple images is deferred.
Since the consistency of requirements.txt is required,
Would it be easier if there was a mechanism to push both images to docker hub
when this one was updated?
Recommended Posts