In this article, I will summarize what I have learned about multithreading and describe it in order to deepen my understanding.
[Multithreading is the parallel flow of multiple processes when executing a single computer program. Also, such multiple processing flows. ](Http://e-words.jp/w/%E3%83%9E%E3%83%AB%E3%83%81%E3%82%B9%E3%83%AC%E3%83%83 % E3% 83% 89.html)
If you divide a program into threads, you can execute them in parallel while sharing the memory context. If no external resources are used, the speed will not increase even if multithreading is performed on a single core CPU. Multi-threading on a multi-core CPU improves the speed of the program by assigning each thread to a separate CPU and executing it in parallel at the same time.
The features are summarized from the viewpoint of simple definition, memory space, and context switch.
[A context switch is to suspend the process flow (process, thread) currently being executed by the computer's processing unit (CPU), switch to another one, and resume execution. ](Http://e-words.jp/w/%E3%82%B3%E3%83%B3%E3%83%86%E3%82%AD%E3%82%B9%E3%83%88 % E3% 82% B9% E3% 82% A4% E3% 83% 83% E3% 83% 81.html)
The context switch of the process needs to switch the memory address space, and this operation is a relatively expensive operation. The following materials were helpful https://code-examples.net/ja/q/530280 https://www.slideshare.net/ssuserc2d4c1/ss-124497965
As a result, the following features exist for each from the viewpoint of efficiency and reliability.
Compared to parallel processing by multiple processes, multithreading is more efficient because it generally shares memory space.
Since multithreading shares memory space, when certain data is used from parallel processing, it is necessary to protect the data from the processing being accessed. If multiple threads try to update one unprotected data at the same time, they will get into a race condition and an unexpected error will occur. You need to lock it to protect your data. It is difficult to use it properly to lock data.
On the other hand, since multi-process does not share memory space, the possibility of data corruption and deadlock that can occur in multi-thread is reduced.
[Global Interpreter Lock (GIL) is an exclusive lock to prevent the non-thread-safe code held by the thread of the interpreter in the programming language from being shared with other threads. ](Https://ja.wikipedia.org/wiki/Global interpreter lock)
The global interpreter lock (hereinafter abbreviated as "GIL") that exists in Ruby and Python is adopted. In Python, the number of threads that access Python objects is always limited to one thread. Why is this? First, the implementation of Python written in C (CPython) is not thread-safe. The situation where it is not thread-safe refers to the situation where data is corrupted when multiple threads execute at the same time or handle the same data. The data mentioned here is, for example, "the contents of the shared memory area". As a means to avoid data corruption caused by not being thread-safe, there is a means to prevent sharing with other threads. In order to prevent sharing with other threads, it is necessary to adopt an exclusive lock mechanism. This exclusive lock is called GIL. Therefore, the GIL always limits the number of threads to one.
The following materials were very helpful http://blog.bonprosoft.com/1632 https://methane.hatenablog.jp/entry/20111203/1322900647
There are two ways to master Python on a multi-CPU machine:
Consider a system that copies files from one directory to another by GUI operation. Multithreading is used as a requirement, copy processing is executed in the background, and the GUI window is constantly updated by the main thread. As a result, the progress of execution or operation is fed back to the user in real time, and the work can be interrupted. Creating an interface based on the responsiveness here means processing time-consuming tasks in the background and giving feedback to the user within a certain period of time. There is the use of multithreading as a method of realizing this. (Not for the purpose of improving performance, but for allowing the user to operate the interface even when data processing takes a long time)
If the process depends on external resources, it may be possible to speed up by multithreading.
When sending a large number of HTTP requests to an external service, multithreading is often used.
It takes time to receive the response If you want to get multiple results from Web API, it takes time to execute them synchronously.
When communicating with WebAPI, parallel requests (requests when multiple requests can be executed completely or partially out of order) are processed in parallel with almost no effect on response time. There is. As a means of realizing this parallel processing, multiple requests may be executed separately as threads.
When executing an HTTP request, it often takes time to read from the TCP socket (recv ()
). In CPython, executing the C language recv ()
function releases the GIL. (This seems to be due to blocking I / O processing, but I still don't understand.)
Multithreading can be used by releasing the GIL.
I wonder if threads are useful for waiting for I / O processing in Python. CPython is still difficult for me.
http://ossforum.jp/node/579 https://ja.wikipedia.org/wiki/グローバルインタプリタロック http://blog.bonprosoft.com/1632 https://methane.hatenablog.jp/entry/20111203/1322900647 http://e-words.jp/w/%E3%82%B3%E3%83%B3%E3%83%86%E3%82%AD%E3%82%B9%E3%83%88%E3%82%B9%E3%82%A4%E3%83%83%E3%83%81.html http://e-words.jp/w/%E3%83%9E%E3%83%AB%E3%83%81%E3%82%B9%E3%83%AC%E3%83%83%E3%83%89.html Mastering TCP / IP Primer 5th Edition Expert Python Programming Revised 2nd Edition
Recommended Posts