[Apache] The story of prefork

Overview

The story of prefork. I will expand from the story what prefork is

TCP communication flow in the first place

To communicate with a TCP client/server on Linux, the server side listens for communication by following the steps below.

socket() -> bind() -> listen() -> acceps()

https://qiita.com/Michinosuke/items/0778a5344bdf81488114

If it is a TCP server that runs in a single process, the above procedure is sufficient, but when multiple clients are connected at the same time, client 1 can connect but client 2 cannot.

Use the fork () system call to support simultaneous connections of multiple clients. The simplest method is to listen in the parent process, create a child process, and have the child process continue. Below is a simple sample. This method prevents problems that occur in a single process TCP server from being created by creating a child process.

//Until listen, the parent does it, and a child process is created in the loop to communicate.
  for (;;) {
    len = sizeof(client);
    sock = accept(sock0, (struct sockaddr *)&client, &len);

    pid = fork();

    if (pid == 0) {
      n = read(sock, buf, sizeof(buf));
      if (n < 0) {
        perror("read");
        return 1;
      }

      write(sock, buf, n);
      close(sock);

      return 0;

    }

For the time being, I just need to write it like that, so the details are broken.

Is socket shared by parents and children even if they fork after accept?

    sock = accept(sock0, (struct sockaddr *)&client, &len);

    pid = fork();

    if (pid == 0) {
      n = read(sock, buf, sizeof(buf));
      if (n < 0) {
        perror("read");
        return 1;
      }

this part. I'm using a socket that my parents opened after forking, but I wonder why this is. As for the answer itself, the following article was very easy to understand. Roughly speaking, open file descriptions (file offsets, etc.) are shared when forked. In other words, the accepted socket can be used by the child process for communication. (Note that this is just a share and not each child process creates a socket). By the way, I searched a lot and found it, but in the case of Ruby, it seems that fd is closed when calling exec.

http://kotaroito.hatenablog.com/entry/20120108/1326030769

You can find it by checking with man or fork (2).

The child process inherits a copy of the set of open file descriptors that the parent process has. Each child process file descriptor has the same open file description that corresponds to the parent process's file descriptor.(file description)To refer to(open(2)See).. This is because the two descriptors are the file state flag, the current file offset, and the signal drive.(signal-driven) I/O attribute(fcntl(2)In F_SETOWN, F_See SETSIG description)Means to share.

Problems with fork model TCP servers

What happens here is the high cost of fork (2). Although CoW works for fork, process generation itself is a difficult task. For details, see the following article ry

https://naoya-2.hatenadiary.org/entry/20071010/1192040413

So what to do? The method of creating threads and event-driven programming are popular, but I will cover the method of using prefork, which is also the subject of the article. If the cost of fork is high, the idea is to fork it first and cover it with the generated socket even if the client comes. As a rough flow

Create socket in parent process
Fork to create a child process
Each child process blocks waiting for acceptance (if the process completes successfully, it returns a non-negative integer that is the descriptor of the accepted socket.)

I will make a state such as. At this time, the child process that is blocking while waiting for the third accept is a mechanism that when a client arrives, only one of the multiple processes is woken up and the woken up process can perform subsequent communication. At this time, the maximum number of simultaneous client connections is the forked number. If you want to connect 1000 users, you need 1000 processes. This area is the one that is said to be the C10k problem.

https://ja.wikipedia.org/wiki/C10K%E5%95%8F%E9%A1%8C

Nginx handles a large amount of access with a small number of processes (1 thread internally), but it is event-driven rather than prefork or thread. Roughly speaking, the problem of context switching is alleviated by monitoring the event for each worker and processing it each time. Since it has nothing to do with the main subject, I will quit feeling that epoll (2) is too terrible for the time being.

https://linuxjm.osdn.jp/html/LDP_man-pages/man7/epoll.7.html

Is it okay to accept from multiple processes on the same socket?

In conclusion, it seems to be okay. I haven't read the source, but it seems that the kernel does a good job in this area. It seems that thundering herd also happened with the old kernel. (It seems that there are cases where acceptmutex is implemented in the application, and it seems that old apache was exclusive by that method)

https://naoya-2.hatenadiary.org/entry/20070311/1173629378

Bonus ① What kind of implementation is Gracefull shutdown?

If the definition of graceful shutdown is as follows

Do not accept new connections after instructing to stop
Wait for the remaining in-flight connection to complete before safely stopping the process

If you want to do it at the TCP layer, close the listen socket. -> Do not accept new connections

Implement the child process so that it ends when the process is completed based on the signal received from the parent-> Safe stop

It becomes the flow. Let's take Nginx as a sample (it's really broken)

static void
ngx_worker_process_cycle(ngx_cycle_t *cycle, void *data)
{
    for ( ;; ) {
        // SIG_The child process that received the QUIT goes below
        if (ngx_quit) {
            ngx_quit = 0;
            ngx_log_error(NGX_LOG_NOTICE, cycle->log, 0,
                          "gracefully shutting down");
            ngx_setproctitle("worker process is shutting down");

            if (!ngx_exiting) {
                ngx_exiting = 1;  //Flag the end
                ngx_set_shutdown_timer(cycle); //Set shutdown timer
                ngx_close_listening_sockets(cycle); //Close the listening socket
                ngx_close_idle_connections(cycle); //Close idle connection
            }
        }
        //Since the end flag is set, enter below
        if (ngx_exiting) {
            if (ngx_event_no_timers_left() == NGX_OK) { // ngx_event_no_timers_left is not OK as long as there is an active connection
                ngx_worker_process_exit(cycle);  //Call the end function
            }
        }
    }
}

If ngx_event_no_timers_left does not monitor active connections, the process will terminate. As I read it, I noticed that this is an implementation that will not work forever unless Timeout is set for this process of Nginx itself if it is stuck somewhere in the backend application. In the first place, I'm talking about something to do on the application side, but I think I'll be addicted to it someday. `` `ngx_event_no_timers_left``` is around here. Check the event tree and return OK to the caller if there are no outstanding events. This time, when OK is returned, the Nginx process will end normally, so it is Grace full shutdown with that.

ngx_int_t
ngx_event_no_timers_left(void)
{
    ngx_event_t        *ev;
    ngx_rbtree_node_t  *node, *root, *sentinel;

    sentinel = ngx_event_timer_rbtree.sentinel;
    root = ngx_event_timer_rbtree.root;

    //The tree that manages the event is root=Returns OK if sentinel node
    if (root == sentinel) {
        return NGX_OK;
    }

    for (node = ngx_rbtree_min(root, sentinel);
         node;
         node = ngx_rbtree_next(&ngx_event_timer_rbtree, node))
    {
        ev = (ngx_event_t *) ((char *) node - offsetof(ngx_event_t, timer));

        //Do not end if there is an event that cannot be canceled
        if (!ev->cancelable) {
            return NGX_AGAIN;
        }
    }

    return NGX_OK;
}

http://nginx.org/en/docs/ngx_core_module.html#worker_shutdown_timeout

As you can imagine from the name ngx_rbtree_ *, it seems that the state of the event is held by a red-black tree. I know it's a kind of balanced binary tree, but I don't know any details, so let's find out. .. ..