It's an article that summarizes what you've learned about Python multi-processing.
When is multi-process used? ⇒ When realizing parallel processing, it is possible to divide the process as a means of realization.
Applications that execute CPU-intensive tasks on a multi-core CPU currently require the use of multi-processes to take advantage of the multi-core CPU.
https://docs.python.org/ja/3/faq/library.html#can-t-we-get-rid-of-the-global-interpreter-lock
Before mentioning the source code using multi-process, I will mention how to start a new process. In any programming language, the way to start a new process is to fork a program. In Python, by executing ʻos.fork ()`, each process will run in a different address space after the memory context has copied the child processes. Below, the source.
fork.py
import os
pid_list = []
def main():
pid_list.append(os.getpid())
child_pid = os.fork()
if child_pid == 0:
pid_list.append(os.getpid())
print()
print("Child: こんにちは,私はChildプロセスです")
print("Child:The PID number I know is%s" % pid_list)
else:
pid_list.append(os.getpid())
print()
print("parent:こんにちは,私はparentプロセスです")
print("parent:The PID number of the child process is%d"%child_pid)
print("parent:The PID number I know is%s"%pid_list)
if __name__ == "__main__":
main()
$python fork.py
parent:こんにちは,私はparentプロセスです
parent:The PID number of the child process is 321
parent:The PID number I know is[320, 320]is
Child: こんにちは,私はChildプロセスです
Child:The PID number I know is[320, 321]is
The initial process has the same 320 PID, but you can see that the child process has added 321 and that the two processes do not share a memory context.
Process memory is not shared by default. If you want to communicate between processes, you need to do some work.
To make this easier, the multiprocessing
module provides several ways to communicate between processes.
The following two methods are introduced here.
multiprocessing.Pipe
multiprocessing.sharedctypes
multiprocessing.Pipe
The Pipe class has a similar concept to Unix and Linux pipes.
multiprocessing.Pipe ()
returns a pair of Connection
objects that represent both ends of the pipe. In the example below (pipesample.py), parent_conn, child_conn = Pipe ()
is applicable. The default Pipe (True)
makes it bidirectional. With Pipe (False)
, the pipe is unidirectional, and withconn1, conn2 = Pipe ()
, conn1
is dedicated to receiving messages and conn2
is dedicated to sending.
The Pipe class also sends and receives pickleable objects.
Reference URL: https://docs.python.org/ja/2.7/library/multiprocessing.html#pipes-and-queues
pipesample.py
from multiprocessing import Process, Pipe
class CustomClass:
pass
def work(connection):
while True:
instance = connection.recv()
if instance:
print("Child:Receive:{}".format(instance))
else:
return
def main():
parent_conn, child_conn = Pipe()
child = Process(target=work, args=(child_conn,))
for item in (
42,
'some string',
{'one':1},
CustomClass(),
None,
):
print("parent:Send:{}".format(item))
parent_conn.send(item)
child.start()
child.join()
if __name__ == "__main__":
main()
$python pipesample.py
parent:Send:42
parent:Send:some string
parent:Send:{'one': 1}
parent:Send:<__main__.CustomClass object at 0x7fc785a34ac8>
parent:Send:None
Child:Receive:42
Child:Receive:some string
Child:Receive:{'one': 1}
Child:Receive:<__main__.CustomClass object at 0x7fc785268978>
If you pass the instance created by for item in (42, ..., None,):
to the argument of parent.send ()
, the process that is paired by receiving child .recv ()
The state of the data is passed to. You can also see that the process addresses are different.
multiprocessing.sharedctypes
In the multiprocessing.sharedctypes
class, a shared memory is created and data types (int type, double type, etc.) are created there.
Provides a way to insert. The data type follows C type. The most basic ones are Value (typecode_or_type, * arg, lock = True)
and ʻArray (typecode_or_type, size_or_initializer, *, lock = True).
typecode_or_typedetermines the type of object returned. It is either a ctypes type or a one-letter type code as used in the array module. Since it is difficult to describe list, dictionary, Namespace, Lock, etc., use
multiprocessing.Manager` in that case.
Reference: https://docs.python.org/ja/3/library/multiprocessing.html#sharing-state-between-processes
valuearray.py
from multiprocessing import Process, Value, Array
def f(n,a):
n.value = 3.141592
for i in range(len(a)):
a[i] = -a[i]
if __name__ == "__main__":
num = Value('d', 0.0)
arr = Array('i', range(10))
p = Process(target=f, args=(num, arr))
p.start()
p.join()
print(num.value)
print(arr[:])
$python valuearray.py
3.141592
[0, -1, -2, -3, -4, -5, -6, -7, -8, -9]
Using multi-process instead of threads adds a lot of overhead. Memory usage increases, especially if each process has an independent memory context. As a result, when a large number of child processes are generated, the harmful effects are greater than processing using threads. In multi-process applications, building a process pool is a good way to control resource utilization. The basic idea of a process pool is to prepare a process specified in advance, and then take items from the queue and process them. Instead of starting the process after the task to be processed arrives, start the process in advance so that the process starts immediately after the task is assigned.
Pool
classThis class takes care of all the complex processes that manage multiple processes.
The following source code uses the Google Map API of GCP (Google Cloud Platform) to obtain the latitude and longitude that hit the city name.
By setting POOL_SIZE = 4
, four processes that operate in parallel are specified. The Pool
class can also use the context manager.
geocoding_by_multiprocessing.py
from multiprocessing import Pool
from gmaps import Geocoding
api = Geocoding(api_key='secret')
PLACES = (
'Reykjavik','Vien','Zadar',
'Venice','Wrocow','Bolognia',
'Berlin','Dehil','New York',
'Osaka'
)
POOL_SIZE = 4
def fetch_place(place):
return api.geocode(place)[0]
def present_result(geocoded):
print("{:s}, {:6.2f}, {:6.2f}".format(
geocoded['formatted_address'],
geocoded['geometry']['location']['lat'],
geocoded['geometry']['location']['lng'],
).encode('utf-8'))
def main():
with Pool(POOL_SIZE) as pool:
results = pool.map(fetch_place, PLACES)
for result in results:
present_result(result)
if __name__ == "__main__":
main()
$ python geocoding_by_multiprocessing.py
b'Reykjav\xc3\xadk, Iceland, 64.15, -21.94'
b'3110 Glendale Blvd, Los Angeles, CA 90039, USA, 34.12, -118.26'
b'Zadar, Croatia, 44.12, 15.23'
b'Venice, Metropolitan City of Venice, Italy, 45.44, 12.32'
b'Wroc\xc5\x82aw, Poland, 51.11, 17.04'
b'Bologna, Metropolitan City of Bologna, Italy, 44.49, 11.34'
b'Berlin, Germany, 52.52, 13.40'
b'Delhi, India, 28.70, 77.10'
b'New York, NY, USA, 40.71, -74.01'
b'Osaka, Japan, 34.69, 135.50'
Studying parallel processing is hard. (Lol)
Recommended Posts