concurrent.futures Usage notes

concurrent.futures

http://docs.python.jp/3/library/concurrent.futures.html

A new standard package in Python 3.2 that makes it easy to implement multithreaded, multiprocess parallel task processing.

ThreadPoolExecutor and ProcessPoolExecutor are implemented by inheriting the base class called Executor, and you can write with almost the same interface using either.

Installation

Since it is a standard package in Python 3.2, there is no need to install it. Backport is available for 2.6 and above.

`python`


pip install futures

Sample code

import concurrent.futures
import hashlib

def digest(t): #Functions for properly consuming CPU resources
    hash = hashlib.sha256()
    for i in range(t*1000000):
        hash.update('hogehoge')
    return hash.hexdigest()

if __name__=='__main__':

    task_list = [1,1,1,2,2,3]

    #Create an Executor object
    executor = concurrent.futures.ProcessPoolExecutor(max_workers=4)

    #Submit the task to the Executor object and get the same number of future objects.
    #Task execution is submit()It starts from the moment you call.
    futures = [executor.submit(digest,t) for t in task_list]

    #Wait for the completion of each future and get the result.
    # as_completed()Returns an iterator that traverses the elements of the given futures in order of completion.
    #If no task is completed, it will be blocked until one is completed.
    for future in concurrent.futures.as_completed(futures):
        print(future.result()) # digest()The return value of is displayed.

    #Wait for all tasks to complete and clean up.
    #Any tasks that have not been completed will be blocked.
    # (As on_Iterates all completed, so there shouldn't really be any tasks that haven't been completed at this point.)
    executor.shutdown()

If you replace ProcessPoolExecutor with ThreadPoolExecutor, it will work in multithreading instead of multiprocessing.

important point

Since ProcessPoolExecutor is realized by interprocess communication, there are some restrictions.
Function arguments and return values must be objects that can be serialized using pickle.
The function itself must also be passed between processes. No instance method. Lambda expression is OK.
Even if you rewrite a global variable as a side effect in a function, it will not be reflected in the calling process.
CPython implements the Global Interpreter Lock, so multiple threads cannot execute Python code at the same time within a single process. In the case of a task that executes Python code like the sample code above, even if it is parallelized with ThreadPoolExecutor, it will be almost sequential processing, so there is not much benefit of execution time. (It is effective for processing with a lot of communication waiting and IO waiting.)