This is a sequel (sister edition?) Of the previous article "Building a PyPy execution environment with Docker". Last time we only had a PyPy environment, but this time we will create an environment for both PyPy and plain Python. Please note that it contains a lot of the same content.
One of the Python implementations. You can run Python modules normally. It runs much faster than plain Python.
The downside of PyPy is that it limits the libraries available. I don't know the details, but it seems that third-party libraries are often unavailable. Therefore, it is convenient to be able to use plain Python for the part that uses the third-party library in the series of processing, and execute it with PyPy for the other part that requires execution speed. This time, we will build an environment that can use both PyPy and Python with Docker.
I wanted to create a virtual environment because I didn't want to pollute the environment of the local machine, but I couldn't find much information on how to create a virtual environment with PyPy. Since I was studying Docker recently and I don't really trust the Python environment of Windows in the first place, I decided to create an environment with Docker.
--Windows 10 Home (version 20H2 build 19042.685) --Environment where Docker can be used with WSL2
Recently, Docker has started to work on WSL2 even on Windows 10 Home, making it very easy to use. For details on how to install WSL2, refer to Microsoft Official Page, for example. After that, you can specify to use WSL2 in the Docker Desktop settings.
You should be able to do it on your Mac in much the same way.
The basic policy is as follows.
--Build a PyPy and Python environment on a container based on PyPy's official Docker image --PyPy environment is managed by Pip and requirements.txt --Pipenv manages Python environment
The reason for using PyPy's Docker image instead of Python is simply because it's the easiest way to set up a PyPy environment. Moreover, the PyPy image also includes a Python execution environment. To avoid cluttering the environment, we will manage the libraries used in Python in the virtual environment of Pipenv. Keep the pypy environment in requirements.txt to make it easier to rebuild and share the environment.
Configure the directory as follows:
│ docker-compose.yml
│ Dockerfile
└─src
pypy_main.py
python_main.py
Pipfile
Pipfile.lock
requirements.txt
In requirements.txt
, describe the library you want to use in the pypy environment. Here, numpy is specified as an example.
requirements.txt
numpy
Pipfile
and Pipfile.lock
specify the library you want to use in your Python environment. Here, as an example, we will prepare one that installs only pandas. Also note that we have specified 3.9 as the Python version (will come out later).
Pipfile.
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages]
pandas = "*"
[dev-packages]
[requires]
python_version = "3.9"
pypy_main.py
is the module you want to run in the pypy environment. Here, as an example, prepare something to execute numpy.
pypy_main.py
import numpy as np
print(np.array([1, 2, 3]))
pyhon_main.py
is the module you want to run in the Python environment. Here, as an example, we will prepare something to run pandas.
python_main.py
import pandas as pd
print(pd.DataFrame([[1, 2, 3], [4, 5, 6]]))
Create Dockerfile
as follows.
# 1.Get PyPy image
FROM pypy:3.7
# 2.pyenv and python3.9 installation
RUN git clone https://github.com/pyenv/pyenv.git ~/.pyenv && \
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc && \
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc && \
echo 'eval "$(pyenv init -)"' >> ~/.bashrc && \
. ~/.bashrc && \
pyenv install 3.9.1
# 3.Work directory/Specified in src
WORKDIR /src
# 4.python virtual environment construction
COPY src/Pipfile* ./
RUN pip install pipenv && \
pipenv install
# 5.pypy environment construction
COPY src/requirements.txt .
RUN pip install -r requirements.txt
Create the docker-compose.yml
file as follows.
docker-compose.yml
version: '3'
services:
pypy-python:
build: .
volumes:
- ./src/:/src
tty: true
The service name is pypy-python
, but anything is fine. You will enter the container with this name later, so I think you should name the project so that it is easy for you to understand.
Dockerfile in the same directory is specified in build
.
The src directory of the host PC is mounted on the src directory of the container with volumes
. The edits on the host PC will now be reflected in the container.
By setting tty
to true, you can enter the container by keeping it running after launching the container.
The preparation is complete.
Open a command prompt in the directory containing docker-compose.yml
and start the container with the following command.
>docker-compose up -d
The first time will take some time.
By adding the option -d
, it will be executed in the background, and subsequent command operations will be possible.
Once the container is up, enter it with the following command.
>docker-compose exec pypy-python bash
Here, the argument pypy-python
is the service name given in the docker-compose.yml file.
The argument bash
is needed to perform command line operations after entering the container. Or rather, an error will occur without this argument.
Now that we're in the src directory inside the container, we have pypy_main.py
and python_main.py
directly underneath.
Use the pypy
command to run PyPy. Try running pypy_main.py
.
# pypy pypy_main.py
[1 2 3]
You can do it like this. You can also see that the numpy described in requirements.txt can be installed in the pypy environment.
Next, try running python_main.py
in the virtual environment of Pipenv.
# pipenv run python python_main.py
0 1 2
0 1 2 3
1 4 5 6
As you can see, I was able to run python in the Pipenv environment. You can see that pandas can be used.
Since the src directory is mounted, if you edit the file on the host PC, it will be reflected in the container. Development can proceed in the flow of editing with the editor of the host PC and executing in the container.
The libraries in the pypy environment can be installed with pip. For example, execute the following command in the container to install tqdm.
# pip install tqdm
After installing the library, it is a good idea to record it in requirements.txt.
# pip freeze > requirements.txt
Since the src directory is mounted, if you rewrite it in the container, it will be reflected on the host machine. If you save this file, you can reproduce the same environment when you recreate the container.
Of course, the library of Python environment is managed by Pipenv. If you want to install a new library, do pipenv install
inside the container.
The rewritten Pipfile and Pipfile.lock are also reflected on the host machine, so if you save this, you can reproduce the same environment.
When you are done using it, use the exit command to exit the container.
# exit
You can leave the container up, but I think it's a good idea to drop it when you're done using it.
>docker-compose down
To use it, do docker-compose up -d
again. From the second time onward, the image remains on the host PC, so it will start up quickly.
However, if you docker-compose down
after installing a new library in the container, the host PC image will not reflect the change. In order for the library installation to take effect, you need to rebuild by adding the --build
option as shown below. This will load a new Pipfile and reqruirements.txt and recreate the container from the image.
>docker-compose up -d --build
I am creating a Unexplored Village Search Tool, and I am using PyPy and Python together as introduced this time in the data preprocessing of this. I had to do a lot of calculations after reading the GIS data, but the library (geopandas) that reads the GIS data cannot be used with PyPy, and the calculation process never ends with Python. Therefore, I took the method of reading GIS data with Python, outputting it to a text file, and performing calculation processing with PyPy. I think PyPy is fast and convenient, but I have the impression that there is little information. Isn't there a lot of people using it?
Also, Docker is convenient. If you use it well, you will not have to worry about the environment and it will be very comfortable. (In fact, why doesn't Python work so well in a Windows environment these days ...)