In the old days, Unix-like systems could make IO processing asynchronous with select () and perform processing only when there was a change in the file descriptor state. This was (apparently) an effective technique in an era when memory and server resources were extremely low. In the latter half of the 90's and the first half of the 2000's, multithreading became mainstream, and on the server side, connections were multiplexed by passing the socket of the client that was accepted and accepted to the thread for transmission and reception. (Of course, asynchronous was also active depending on the purpose)
Since the latter half of the 2000s, the C10K problem has been making noise, and node.js and nginx have appeared. Asynchronous processing that monitors the state of descriptors by making sockets non-blocking again will attract attention rather than the method of creating a large number of threads and consuming resources and memory. And recently, libraries such as javascript promise, RX, Reactive Extension that can efficiently describe asynchronous processing in a place close to the client side UI have appeared.
However, when considering asynchronous processing on the server side, it is not possible to make full use of multi-core and multi-CPU just by processing asynchronous IO with a single thread. Therefore, there is also a method of asynchronously waiting for the ready state of the socket after receiving it with Accept (), and when the socket becomes ready, the thread actually sends and receives. Asynchronous processing itself has multiple methods such as select (), poll, and epoll. In addition, there are two methods for multiplexing processes: using threads and forking processes. There is also a method of creating threads and processes in advance and passing data to them.
Method | merit | デmerit |
---|---|---|
select | Reduced memory and resource consumption | There is a limit to the descriptors that can be handled Cannot handle other sockets during transmission / reception processing of one socket |
poll | Reduced memory and resource consumption No descriptor limit |
Cannot handle other sockets during transmission / reception processing of one socket |
EPOLL | Accelerated version of poll No descriptor limit |
Cannot handle other sockets during transmission / reception processing of one socket |
fork | There are not many sockets waiting to be processed | High process generation cost |
thread | There are not many sockets waiting to be processed | Consume thread creation cost and stack space for connections |
pre-fork | There are not many sockets waiting to be processed Process creation cost is only at startup |
Data transfer and lock processing, process management effort |
pre-thread | There are not many sockets waiting to be processed Thread creation cost is only at startup |
Data transfer, lock processing, and thread management |
EPOLL+thread | Maximum latency while reducing memory and resource consumption | Thread creation cost. Efforts of data transfer and thread management |
EPOLL+pre thread | Maximum latency while reducing memory and resource consumption | Data passing and thread management is more complicated, but if implemented well, it should give perfect performance. |
This area is "[Linux Network Programming Bible](http://www.amazon.co.jp/Linux Network Programming Bible-Mitsuyuki Omata-ebook/dp/B00O8GIL62/ref=sr_1_1?ie=UTF8&qid=1429454642&sr=8- 1) ”is detailed.
Personally, I think "EPOLL + thread" is good, but according to the benchmarks in the above book, "EPOLL + thread" has the fastest connection time, but "fork" has the fastest transmission and reception. It seems. In terms of overall order including connection time, "pre-thread" was the fastest. As far as sending and receiving is concerned, the result is that it is faster to use "poll" and "EPOLL" normally than to do special things such as fork and thread including "pre- ~" (Is it true? ??). By the way, it is written that "EPOLL + thread" is 4th overall and slower than "select", "poll" and "EPOLL", but this result is also unbelievable.
The content of the benchmark seems to be an experiment in which 1000 threads are started by repeating the process of connecting by sending and receiving 50 times, and the server is a dial core, and only the send () part is threaded or forked.
Even if threads and forks are pre-generated, will performance deteriorate due to data transfer and shared data locking? I think it's quite possible that the point of dial core has an effect (the book also says that it's a simple comparison without server-side resource monitoring).
Since "EPOLL + pre-thread" is not introduced, this may be the fastest if the number of threads is properly suppressed in a multi-core environment. When writing by myself, I'm thinking of reducing the number of threads with "EPOLL + pre-thread".
By the way, the book also introduced a method of processing multiple Accepts themselves with threads, but I do not know the pros and cons of this method because it locks up and one of them becomes the receiver.
Recommended Posts