[Python] About multi-process

About this article

It's an article that summarizes what you've learned about Python multi-processing.

About multi-process in Python

When is multi-process used? ⇒ When realizing parallel processing, it is possible to divide the process as a means of realization.

Applications that execute CPU-intensive tasks on a multi-core CPU currently require the use of multi-processes to take advantage of the multi-core CPU.

https://docs.python.org/ja/3/faq/library.html#can-t-we-get-rid-of-the-global-interpreter-lock

Benefits of multi-processing in Python

No GIL restrictions, so more resources can be used
Does not share memory context, reducing the risk of data corruption and deadlock

To start a process

Before mentioning the source code using multi-process, I will mention how to start a new process. In any programming language, the way to start a new process is to fork a program. In Python, by executing ʻos.fork ()`, each process will run in a different address space after the memory context has copied the child processes. Below, the source.

`fork.py`


import os

pid_list = []

def main():
    pid_list.append(os.getpid())
    child_pid = os.fork()

    if child_pid == 0:
        pid_list.append(os.getpid())
        print()
        print("Child: こんにちは，私はChildプロセスです")
        print("Child:The PID number I know is%s" % pid_list)
    
    else:
        pid_list.append(os.getpid())
        print()
        print("parent:こんにちは，私はparentプロセスです")
        print("parent:The PID number of the child process is%d"%child_pid)
        print("parent:The PID number I know is%s"%pid_list)

if __name__ == "__main__":
    main()

$python fork.py

parent:こんにちは，私はparentプロセスです
parent:The PID number of the child process is 321
parent:The PID number I know is[320, 320]is

Child: こんにちは，私はChildプロセスです
Child:The PID number I know is[320, 321]is

The initial process has the same 320 PID, but you can see that the child process has added 321 and that the two processes do not share a memory context.

Implementation of interprocess communication

Process memory is not shared by default. If you want to communicate between processes, you need to do some work. To make this easier, the multiprocessing module provides several ways to communicate between processes. The following two methods are introduced here.

How to use multiprocessing.Pipe
How to use multiprocessing.sharedctypes

About `multiprocessing.Pipe`

The Pipe class has a similar concept to Unix and Linux pipes. multiprocessing.Pipe () returns a pair of Connection objects that represent both ends of the pipe. In the example below (pipesample.py), parent_conn, child_conn = Pipe () is applicable. The default Pipe (True) makes it bidirectional. With Pipe (False), the pipe is unidirectional, and withconn1, conn2 = Pipe (), conn1 is dedicated to receiving messages and conn2 is dedicated to sending. The Pipe class also sends and receives pickleable objects.

Reference URL: https://docs.python.org/ja/2.7/library/multiprocessing.html#pipes-and-queues

`pipesample.py`


from multiprocessing import Process, Pipe

class CustomClass:
    pass

def work(connection):
    while True:
        instance = connection.recv()

        if instance:
            print("Child:Receive:{}".format(instance))

        else:
            return

def main():
    parent_conn, child_conn = Pipe()

    child = Process(target=work, args=(child_conn,))

    for item in (
        42,
        'some string',
        {'one':1},
        CustomClass(),
        None,
    ):
        print("parent:Send:{}".format(item))
        parent_conn.send(item)
    
    child.start()
    child.join()

if __name__ == "__main__":
    main()

$python pipesample.py
parent:Send:42
parent:Send:some string
parent:Send:{'one': 1}
parent:Send:<__main__.CustomClass object at 0x7fc785a34ac8>
parent:Send:None
Child:Receive:42
Child:Receive:some string
Child:Receive:{'one': 1}
Child:Receive:<__main__.CustomClass object at 0x7fc785268978>

If you pass the instance created by for item in (42, ..., None,): to the argument of parent.send (), the process that is paired by receiving child .recv () The state of the data is passed to. You can also see that the process addresses are different.

Implementation using `multiprocessing.sharedctypes`

In the multiprocessing.sharedctypes class, a shared memory is created and data types (int type, double type, etc.) are created there. Provides a way to insert. The data type follows C type. The most basic ones are Value (typecode_or_type, * arg, lock = True) and ʻArray (typecode_or_type, size_or_initializer, *, lock = True). typecode_or_typedetermines the type of object returned. It is either a ctypes type or a one-letter type code as used in the array module. Since it is difficult to describe list, dictionary, Namespace, Lock, etc., usemultiprocessing.Manager` in that case. Reference: https://docs.python.org/ja/3/library/multiprocessing.html#sharing-state-between-processes

`valuearray.py`


from multiprocessing import Process, Value, Array

def f(n,a):
    n.value = 3.141592
    for i in range(len(a)):
        a[i] = -a[i]

if __name__ == "__main__":
    num = Value('d', 0.0)
    arr = Array('i', range(10))

    p = Process(target=f, args=(num, arr))
    p.start()
    p.join()

    print(num.value)
    print(arr[:])

$python valuearray.py
3.141592
[0, -1, -2, -3, -4, -5, -6, -7, -8, -9]

Use process pool

Using multi-process instead of threads adds a lot of overhead. Memory usage increases, especially if each process has an independent memory context. As a result, when a large number of child processes are generated, the harmful effects are greater than processing using threads. In multi-process applications, building a process pool is a good way to control resource utilization. The basic idea of a process pool is to prepare a process specified in advance, and then take items from the queue and process them. Instead of starting the process after the task to be processed arrives, start the process in advance so that the process starts immediately after the task is assigned.

About the `Pool` class

This class takes care of all the complex processes that manage multiple processes.

The following source code uses the Google Map API of GCP (Google Cloud Platform) to obtain the latitude and longitude that hit the city name. By setting POOL_SIZE = 4, four processes that operate in parallel are specified. The Pool class can also use the context manager.

`geocoding_by_multiprocessing.py`



from multiprocessing import Pool

from gmaps import Geocoding

api = Geocoding(api_key='secret')

PLACES = (
    'Reykjavik','Vien','Zadar',
    'Venice','Wrocow','Bolognia',
    'Berlin','Dehil','New York',
    'Osaka'
)

POOL_SIZE = 4

def fetch_place(place):
    return api.geocode(place)[0]

def present_result(geocoded):
    print("{:s}, {:6.2f}, {:6.2f}".format(
        geocoded['formatted_address'],
        geocoded['geometry']['location']['lat'],
        geocoded['geometry']['location']['lng'],
    ).encode('utf-8'))

def main():
    with Pool(POOL_SIZE) as pool:
        results = pool.map(fetch_place, PLACES)
    
    for result in results:
        present_result(result)

if __name__ == "__main__":
    main()

$ python geocoding_by_multiprocessing.py
b'Reykjav\xc3\xadk, Iceland,  64.15, -21.94'
b'3110 Glendale Blvd, Los Angeles, CA 90039, USA,  34.12, -118.26'
b'Zadar, Croatia,  44.12,  15.23'
b'Venice, Metropolitan City of Venice, Italy,  45.44,  12.32'
b'Wroc\xc5\x82aw, Poland,  51.11,  17.04'
b'Bologna, Metropolitan City of Bologna, Italy,  44.49,  11.34'
b'Berlin, Germany,  52.52,  13.40'
b'Delhi, India,  28.70,  77.10'
b'New York, NY, USA,  40.71, -74.01'
b'Osaka, Japan,  34.69, 135.50'

Impressions

Studying parallel processing is hard. (Lol)

References

Expert Python Programming Second Edition
https://docs.python.org/ja/3/

[Python] About multi-process

About this article

About multi-process in Python

Benefits of multi-processing in Python

To start a process

fork.py

Implementation of interprocess communication

About multiprocessing.Pipe

pipesample.py

Implementation using multiprocessing.sharedctypes

valuearray.py