concurrent.futures
http://docs.python.jp/3/library/concurrent.futures.html
A new standard package in Python 3.2 that makes it easy to implement multithreaded, multiprocess parallel task processing.
ThreadPoolExecutor and ProcessPoolExecutor are implemented by inheriting the base class called Executor, and you can write with almost the same interface using either.
Since it is a standard package in Python 3.2, there is no need to install it. Backport is available for 2.6 and above.
python
pip install futures
import concurrent.futures
import hashlib
def digest(t): #Functions for properly consuming CPU resources
hash = hashlib.sha256()
for i in range(t*1000000):
hash.update('hogehoge')
return hash.hexdigest()
if __name__=='__main__':
task_list = [1,1,1,2,2,3]
#Create an Executor object
executor = concurrent.futures.ProcessPoolExecutor(max_workers=4)
#Submit the task to the Executor object and get the same number of future objects.
#Task execution is submit()It starts from the moment you call.
futures = [executor.submit(digest,t) for t in task_list]
#Wait for the completion of each future and get the result.
# as_completed()Returns an iterator that traverses the elements of the given futures in order of completion.
#If no task is completed, it will be blocked until one is completed.
for future in concurrent.futures.as_completed(futures):
print(future.result()) # digest()The return value of is displayed.
#Wait for all tasks to complete and clean up.
#Any tasks that have not been completed will be blocked.
# (As on_Iterates all completed, so there shouldn't really be any tasks that haven't been completed at this point.)
executor.shutdown()
If you replace ProcessPoolExecutor
with ThreadPoolExecutor
, it will work in multithreading instead of multiprocessing.
Since ProcessPoolExecutor is realized by interprocess communication, there are some restrictions.
Function arguments and return values must be objects that can be serialized using pickle.
The function itself must also be passed between processes. No instance method. Lambda expression is OK.
Even if you rewrite a global variable as a side effect in a function, it will not be reflected in the calling process.
CPython implements the Global Interpreter Lock, so multiple threads cannot execute Python code at the same time within a single process. In the case of a task that executes Python code like the sample code above, even if it is parallelized with ThreadPoolExecutor, it will be almost sequential processing, so there is not much benefit of execution time. (It is effective for processing with a lot of communication waiting and IO waiting.)
Recommended Posts