Python is [GIL (Global Interpreter Lock)](https://ja.wikipedia.org/wiki/%E3%82%B0%E3%83%AD%E3%83%BC%E3%83%90%E3% 83% AB% E3% 82% A4% E3% 83% B3% E3% 82% BF% E3% 83% 97% E3% 83% AA% E3% 82% BF% E3% 83% AD% E3% 83% 83% E3% 82% AF) is applied, and basically, if you have multiple CPU cores, you cannot use all the resources just by writing the code.

However, when processing a large amount of data, there are times when you want to use all the CPU resources of the machine to calculate as fast as possible.

With keywords such as Python parallel processing If you search, you can often see the explanation of multiprocessing module by the standard library, so I think many people are using this. ..

I would like to use this module when creating a system that incorporates parallel processing, but honestly it is a little troublesome when just writing abandoned code ...

Therefore, I would like to introduce a module called Joblib that can execute parallel processing more easily and quickly.

Not to mention the reduced amount of code, as another advantage

--It also displays the error that was thrown by the child process. --Child process will be terminated when it is terminated by Ctrl + c --There is an option to automatically display the overall progress

And so on. All of them are very useful for me because I have to implement them by myself in multiprocessing.

Installation

First is the installation. It is one shot with pip. The version at the time of writing this article is 0.9.3.

pip install joblib
Successfully installed joblib-0.9.3

The environment used for the test is a virtual 4-core, physical 2-core MacBook Pro.

Code used for testing

To simply compare the calculation times, test with code like the following. The calculation content itself has no particular meaning.

First, repeat a simple calculation, measure the calculation time with the time module, and display it.

# -*- coding: utf-8 -*-

from time import time

def process(n):
	return sum([i*n for i in range(100000)])

start = time()

#Iterative calculation
total = 0
for i in range(10000):
	total += process(i)
print(total)

print('{}It took a second'.format(time() - start))

249972500250000000
78.It took 2647480965 seconds

The answer to the calculation is 249972500250000000, which takes 78 seconds.

Parallelize with Joblib

Let's change only the iterative calculation part of the above code and parallelize it. Multi-process parallelization can be achieved by combining Parallel and delayed.

# -*- coding: utf-8 -*-
from joblib import Parallel, delayed
from time import time

def process(n):
	return sum([i*n for i in range(100000)])

start = time()

#Iterative calculation(Parallelization)
r = Parallel(n_jobs=-1)( [delayed(process)(i) for i in range(10000)] )
print(sum(r))

print('{}It took a second'.format(time() - start))

249972500250000000
37.5521140099 took

It has been shortened to 37 seconds! The answer to the calculation is also correct. I was able to shorten the calculation time quickly just by rewriting a part of the code.

The parallel argument n_jobs is the number of cores to use, and if you set this to -1, it will always execute with the maximum number of cores that can be used.

If you set it to 1, it will be the same as running without parallelization, so it is easy to restore it. It seems that there is no problem even if the iterative process is basically written in Parallel.

Also, if you specify a number from 0 to 10 in the argument verbose, the progress will be output according to the specified frequency. (0 means no output, 10 is the mode)

r = Parallel(n_jobs=-1, verbose=10)( [delayed(process)(i) for i in range(10000)] )

Bonus: Manipulate variables outside the method

In the above code example, it is possible to refer to variables existing in the external scope from within the process method that is executed in parallel, but it is not possible to assign a new value. This is because the memory area that can be referenced differs for each process.

The following code works normally, but parallelized code causes an error and cannot be executed.

#Referencing and manipulating the external variable number from within the process method
number = 0

def process(n):
	number = 3.14
	return sum([i*n for i in range(100000)])

To solve this, it is necessary to create a variable that shares a memory area between processes. It is provided in the multiprocessing module, so let's use it.

The above code can be realized as follows.

# -*- coding: utf-8 -*-
from joblib import Parallel, delayed
from multiprocessing import Value, Array

shared_int = Value('i', 1)

def process(n):
	shared_int.value = 3.14
	return sum([i*n for i in range(100000)])

#Iterative calculation(Parallelization)
Parallel(n_jobs=-1)( [delayed(process)(i) for i in range(10000)] )

print(shared_int.value)

3.14

By using the Value class, you can prepare numbers such as int and double as shared variables. Please note that it is necessary to specify the type with the first argument in advance.

It is also possible to prepare a list of specified types by using ʻArray ('d', [0.0, 0.0, 0.0])`.

It seems to be useful when you want to display your own custom progress!

Finally

I was surprised that I could implement it smoothly when I tried parallel processing with Python and unexpectedly.

The author posts various technical information on Twitter every day. I would be grateful if you could follow me.

https://twitter.com/YuhsakInoue

[Python] Easy parallel processing with Joblib

Installation

Code used for testing

Parallelize with Joblib

Bonus: Manipulate variables outside the method

Finally