Pip install of C language dependent module with alpine makes it heavy
I tried to see how long it would take to pip install obediently.
When I executed it with requirements.txt
for machine learning that I had, the following log came out.
Building wheels for collected packages: backcall, h5py, kiwisolver, matplotlib, numpy, pandas, pathspec, Pillow, PyYAML, pyzmq, regex, retrying, scikit-learn, scipy, tornado, typed-ast
We decided to pick up and measure only the modules that are likely to be used frequently during development.
There were two types of environments that I could run at hand, so I'll try both.
The execution time is measured by adding the time command to the head.
Both cases are executed in the same dockerfile, which takes the form of embedding time pip install hoge
in the RUN
statement.
FROM python:3.8-alpine3.11
RUN apk update \
&& apk add --virtual .build --no-cache openblas-dev lapack-dev freetype-dev \
gfortran libxml2 g++ gcc zip unzip cmake make \
libpng-dev openssl-dev musl libffi-dev python3-dev libxslt-dev \
libxml2-dev jpeg-dev \
&& apk add --no-cache -X http://dl-cdn.alpinelinux.org/alpine/edge/testing hdf5-dev
RUN pip install --upgrade --no-cache-dir pip setuptools wheel && \
time pip install --no-cache-dir Cython && \
...
Like this.
The following table summarizes the logs output during docker build
.
.whl
format such as Cython-0.29.16-py2.py3-none-any.whl
. It is listed here for comparison with independent modules. [^ 1][^ 1]: As described in here, none-any
is a non-OS architecture dependent module, mostly written in pure python. It has been. See also Official PEP425.
module / time | case1 | case2 | ||||
---|---|---|---|---|---|---|
real | user | sys | real | user | sys | |
Cython | 3.07s | 2.33s | 0.42s | 1.52s | 0.99s | 0.07s |
numpy | 5m 28.27s | 7m 6.90s | 22.11s | 2m 11.15s | 3m 23.69s | 6.05s |
pandas | 31m 46.58s | 30m 36.25s | 0m 53.83s | 14m 8.24s | 13m 53.10s | 15.88s |
Pillow | 50.81s | 44.09s | 5.99s | 24.79s | 19.88s | 1.69s |
scipy | 30m 45.99s | 36m 29.33s | 1m 45.89s | 12m 52.81s | 17m 54.87s | 42.58s |
scikit-learn | 14m 38.63s | 14m 3.37s | 28.98s | 6m 33.10s | 6m 24.84s | 10.11s |
h5py | 3m 45.58s | 3m 34.79s | 9.30s | 1m 45.87s | 1m 42.42s | 4.18s |
matplotlib | 2m 51.50s | 2m 35.53s | 13.59s | 1m 21.77s | 1m 13.70s | 6.75s |
regex | 30.52s | 28.84s | 1.24s | 13.75s | 13.06s | 0.34s |
Oops It's sloppy, it's sloppy
In case1, numpy alone takes 5 minutes, and pandas / scipy takes more than 30 minutes. Since scikit-learn depends on numpy and scipy, you actually have to be prepared for an hour. It's over an hour and a half when all of the above are combined. What will this be? Assuming you work 8 hours a day
** You can only deploy about 5 times a day **
It becomes difficult to easily allocate the latest branch to the development environment and even check for minor bug fixes. If the working time is included, it will be further reduced.
For the time being, if the memory is doubled, the processing speed will also be doubled, so it is not impossible to shorten the time. If you are using aws, you can save time by selecting an instance type with a large amount of memory. However, in case 2, which is more than twice the performance of case 1, it takes less than half the time, but it takes about 40 minutes in total. You can only deploy up to 12 times a day.
By the way, since pip install
was done in order from the top of the table, the execution time of some modules includes the installation execution time of dependent modules.
kiwisolver also seems to be falling with tar.gz
, so matplotlib alone should be a little faster.
Successfully built pandas
Installing collected packages: six, python-dateutil, pytz, pandas
Successfully installed pandas-0.25.3 python-dateutil-2.8.1 pytz-2019.3 six-1.14.0
------
Successfully built scikit-learn
Installing collected packages: joblib, scikit-learn
Successfully installed joblib-0.14.1 scikit-learn-0.22.2.post1
-----
Successfully built matplotlib kiwisolver
Installing collected packages: cycler, kiwisolver, pyparsing, matplotlib
Successfully installed cycler-0.10.0 kiwisolver-1.2.0 matplotlib-3.2.1
Even if it is light, it takes 1 to 2 minutes for each module, so it is necessary to consider the operation that does not require building C language dependent modules as much as possible. Use ~~ conda, without now ~~
At the time of pip install
, immediately after the wheel conversion of the module is completed, whl is redeployed again and the necessary products are moved directly under site-package
. It's all, including the shared library group * .so
built from code written in python.
If the python3.8
binary is directly under / usr / local / bin
like alpine, it will be moved to / usr / local / lib / python3.8 / site-packages /
.
Redeployment may seem like a hassle, but it's a PEP-compliant safe module deployment. PEP491 also says so [^ 2]
[^ 2]: As an aside, anaconda uses its own module installation method, so it is not in line with PEP. One end that is causing the danger of mixing.
As a test, ʻunzip the pre-built whl file and search for the
.so` file.
./numpy/linalg/lapack_lite.cpython-38-x86_64-linux-gnu.so
./numpy/linalg/_umath_linalg.cpython-38-x86_64-linux-gnu.so
./numpy/core/_operand_flag_tests.cpython-38-x86_64-linux-gnu.so
...
After installing each whl file with pip install
, if you search for the .so
file in the library path, you can see that the pre-built file exists for each module.
/usr/local/lib/python3.8/site-packages/numpy/linalg/lapack_lite.cpython-38-x86_64-linux-gnu.so
/usr/local/lib/python3.8/site-packages/numpy/linalg/_umath_linalg.cpython-38-x86_64-linux-gnu.so
/usr/local/lib/python3.8/site-packages/numpy/core/_operand_flag_tests.cpython-38-x86_64-linux-gnu.so
...
/usr/local/lib/python3.8/site-packages/pandas/io/sas/_sas.cpython-38-x86_64-linux-gnu.so
/usr/local/lib/python3.8/site-packages/pandas/_libs/hashtable.cpython-38-x86_64-linux-gnu.so
...
Now you can finally import it with python,
On the flip side, you can use it as long as the product is in the path that python searches for the module.
So, in Workaround of the original article, put it later except for python related Assuming that, I'm copying directly under / usr / local /
using docker's multi-stage build.
conda skeleton
.
――However, it seems that conda has rebuilt it and introduced it. Personally, it's a feeling of joy
-Install sentence piece on anaconda via wheel fileRecommended Posts