[RUBY] I see, I will read the UNIX process

――Everywhere in Ruby reflects Unix system calls, culture and ideas. ――By using Ruby, you can leave low-level things to the language and learn the idea of Unix itself.

Chapter 1 Introduction

--All code is running on the process. --When traffic and resources are tight, you have to look beyond the application code.

Chapter 2 Guide to this book

――The idea of Unix programming and its technology will be useful for the next 40 years.

--System call --The program cannot operate the kernel directly and must be all via system calls. --The system call interface acts as an intermediary between the kernel and userland.

  1. Userland --The place where the program you wrote is executed. --Control of processing flow based on arithmetic operations, character string operations, and logical operations.
  2. Kernel --A middle layer that sits at the top of computer hardware and controls the hardware. --Controls such as reading and writing file systems, exchanging data via networks, allocating memory, and playing music through speakers.

--man page --All the documentation that is appropriate for learning Unix programming. ――It will be helpful under the following situations.

  1. When you are writing a program in C and want to find out how to use system calls.
  2. When you want to understand the purpose of a system call.

--man page section

  1. User command (shell command) that anyone can execute
  2. System call (function provided by the kernel)
  3. Subroutine (C library function)
  4. Device (special file in / dev directory)

--Process: Unix atom --All code is executed on the process. --Launching ruby from the command line spawns a new process to execute the code. When you finish executing the code, the process ends. --The MySQL server keeps running all the time because the dedicated process of the MySQL server keeps running.

Chapter 3 Processes Have IDs

--Cross reference --ps (1) command --Cross-reference the information that pid sees from the kernel. --I often see it in log files. --If you are logging multiple processes in one file, you must be able to identify which process each line in the log file came from. You can solve this problem by including the pid in each line of the log.

--Commands that can be cross-referenced with the information provided by the OS - top(1) --Display the running process in real time. - lsof(8) --list open files. List open files.

Chapter 3 Processes Have Parents

--Every process has a parent process.

--Parent process --The process that started the process.

--Example --When I launch "Terminal.app" on Mac OS X, I get a bash prompt. ――Since everything is a process, this behavior means that you started the process of "Terminal.app" and then the bash process. --At this time, the parent process of the bash process becomes the process of "Terminal.app". --If you run the ls (1) command from the bash prompt, the parent process of the ls process is the bash process.

--Practical example --There are not many cases where ppid is actually used, and it may be important if you want to detect a daemon process.

Chapter 5 Processes Have File Descriptors

--The number of the opened file is represented as a file descriptor, just as the running process is represented by pid.

--Everything is a file --One of Unix philosophy. --Devices, sockets, pipes, files, etc. are all treated as files.

--File description represents a resource --When a resource is opened in a running process, a file descriptor is assigned. --File descriptions are not shared between unrelated processes. --File descriptions are usually shared through parent-child relationships, but how to share them between processes that are completely unrelated (Use of auxiliary data in UNIX domain sockets There is also man7 / unix.7.html # lbAD)). (Thanks to @ angel_p_57!) --The file descriptor is destroyed when the process that opened the resource terminates. --The file description is released when all file descriptors that refer to the same file description are destroyed.

--File descriptors are destined to live with the process and die with the process.

--In Ruby, open resources are represented by IO classes. Every IO object knows its own file descriptor. --You can get the file descriptor using IO # fileno.

--File descriptors are assigned in order from the smallest unused integer. --When the resource is closed, the file descriptor assigned to it becomes available again.

--Standard stream --Every Unix process comes with three open resources. 0. Standard input (STDIN)

  1. Standard output (STDOUT)
  2. Standard error output (STDERR)

--Practical example --File descriptors are the heart of network programming using sockets and pipes.

Chapter 6 Processes have resource limitations

――How many file descriptors can one process have? --Depends on system settings.

--Resource limits are set for each process by the kernel.

Chapter 7 Process has an environment

--All processes inherit environment variables from their parent process. --Environment variables are set by the parent process and are passed on to the child process. --Environment variables exist for each process and can be accessed globally in each process.

--ENV partially implements the API of Enumerable and Hash, but it does not have exactly the same functions as Hash.

--Practical example

$ RAILS_ENV=production rails server
$ EDITOR=mate bundle open actionpack
$ QUEUE=default rake resque:work

--Environment variables are often used as a way to pass input to command line tools.

Chapter 8 Processes have arguments

$ cat argv.rb
p ARGV
$ ruby argv.rb foo bar -va
["foo", "bar", "-va"]

--Practical example --If you want to pass the file name to the program. --For example, when writing a program that receives one or more file names from the command line and processes the files. --Analysis of command line arguments

Chapter 9 Processes Have Names

--Unix processes have few means of telling the state of a process. --Invention of log files by programmers. --Log files can be written to the file system to share whatever information the process wants to convey. ――But this is more about the file system level than the process itself. --Open the socket and use the network. --A process can communicate with other processes, but because it depends on the network, this is also different from the level of the process itself.

--Two mechanisms to convey information at the process level.

  1. Process name
  2. Exit code

Chapter 10 Process has exit code

--Exit code value --Every process ends with an exit code value (0-255) that indicates whether it ended normally or above.

--Exit code 0 --At the end of normal --Other exit codes indicate an error.

--How to end the process 1. exit 2. exit! 3. abort 4. raise

Chapter 11 Processes Can Create Child Processes

--Process generation --The process that calls fork (2) is called the "parent process", and the newly created process is called the "child process".

--Child process --The child process takes over all the memory copies used by the parent process. --If a process loads a huge amount of software and it consumes 500MB of memory (ex. Rails app), if you spawn two child processes from this process, each child process will be in memory. It will keep a huge copy of the software efficiently. --With fork, the call comes back immediately, and there are three processes that consume 500MB of memory. --It's really convenient when you want to launch multiple instances of an application at the same time. --The file descriptor opened by the parent process is inherited in the same way. --The same file descriptor as the parent process is assigned to the child process. --Therefore, you can share open files, sockets, etc. between two processes. --Because it is a completely new process, it will be assigned a unique pid. --ppid is the pid of the process that executed fork (2). --The memory copied by the child process can be freely changed without affecting the parent process.

--fork method --The fork method is called once and actually returns twice. --fork is a method to spawn a new process! --One returns to the calling parent process and the other to the spawned child process.

#Both the if and else clauses of the if statement are executed
#On the parent process side, the pid of the created child process is returned, and on the child process side, fork returns nil.
if fork
  puts "entered the if block" 
else
  puts "entered the else block" 
end
=> entered the if block
entered the else block

--Is fork multi-core programming? --This will happen if the newly created process can be distributed (in parallel) across multiple CPU cores, but there is no guarantee that it will be processed in multiple cores. --For example, of the four CPUs, all four processes could be processed by a single CPU.

--Use blocks --A common method in Ruby is to pass blocks to fork. --If you call the fork method with a block, the block will only be executed by the child process and ignored by the parent process. --The child process ends there when the processing in the block is completed. The processing of the parent process does not continue.

fork do
  #Describe the process to be executed in the child process here
end

#Describe the process to be executed in the parent process here

Chapter 12 Orphan Process

--The child process remains alive even if the parent process dies. --When you create a child process, for example, if you enter Ctrl-C, process control may not work as to whether the parent or child process should be terminated.

--Manage the orphan process --Demon process ――It is a process that has been intentionally orphaned, and aims to keep moving forever. --Unix signals --How to communicate with a process that does not have a terminal.

Chapter 13 The process is gentle

--It is a considerable overhead for the child process to copy all the data that the parent process has in memory.

--Copy on Write (CoW, Copy on Write) --A mechanism that delays the actual copy of memory until it needs to be written. --In the meantime, the parent and child processes physically share the same data in memory. --By copying memory only when either the parent or the child needs to change it, the independence of both processes is maintained.

--CoW is very convenient and fast to save resources when spawning child processes with fork (2). --You only have to copy the data that the child process needs, and share the rest.

--For CoW to work well, the Ruby implementation must be written so as not to break this feature provided by the kernel.

Chapter 14 Process can wait

--Fire and forget --When you want the child process to process asynchronously and the parent process wants to proceed independently.

message = 'Good Morning'
recipient = '[email protected]'

fork do
  #Create a child process and send the data to the statistic collector
  #The parent process continues the actual message sending process.
  #
  #As a parent process, I don't want this work to slow down,
  #I don't care if the transmission to the statistic collector fails for some reason.
  StatsCollector.record message, recipient
end

#Send a message to the actual destination

--Babysitter --Except for the above cases, in most cases using fork (2), some kind of mechanism that can manage child processes on a regular basis is required.

Change before:

fork do
  5.times do
    sleep 1
    puts "I'm an orphan!"
  end
end

abort "Parent process died..."

After change:

fork do
  5.times do
    sleep 1
    puts "I am an orphan!"
  end
end

Process.wait
abort "Parent process died..."
I am an orphan!
I am an orphan!
I am an orphan!
I am an orphan!
I am an orphan!
Parent process died...

--The exit status is used as a means of communication between processes by the exit code. --The exit code is used to convey information to other processes, but Process.wait2 allows you to refer to that information directly.

Example of interprocess communication without file system or network:

#Spawn 5 child processes
5.times do
  fork do
    #Generate a random value for each child process.
    #If it is even, it returns 111, and if it is odd, it returns 112 as the exit code.
    if rand(5)
      exit 111
    else
      exit 112
    end
  end
end

5.times do
  #Wait for the spawned child process to finish.
  pid, status = Process.wait2

  #If the exit code is 111
  #You can see that the values generated by the child process are even.
  if status.exitstatus == 111
    puts "#{pid} encountered an even number!"
  else
    puts "#{pid} encountered an odd number!"
  end
end
favourite = fork do
  exit 77
end

middle_child = fork do
  abort "I want to be waited on!"
end

pid, status = Process.waitpid2 favourite
puts status.exitstatus

--Process.wait and Process.waitpid actually both point to the same function. --You can pass pid to Process.wait to wait for the termination of a particular child process, or you can pass -1 to Process.waitpid to wait for any process. ――As a programmer, it is important to use tools that can express intentions as much as possible, so even if the two methods are the same, it is better to use them properly as follows. --Process.wait to wait for any child process --Process.waitpid when waiting for a specific process

--Since the kernel queues the information of the terminated process, the parent process can always receive the information at the time of termination of the child process. --Therefore, there is no problem even if the parent process takes time for the processing that accompanies the termination of the child process.

--Practical example --Leveraging child processes is the most common pattern in Unix programming. --Called babysitter process, master / worker, prefork, etc. --From one prepared process, multiple child processes are created for parallel processing, and then the child processes are taken care of. --Web server Unicorn --Unicorn specifies how many worker processes to use when starting the server. --If you specify that you need 5 instances, the unicorn process spawns 5 child processes to handle web requests after launch. The parent (or master) process monitors the life and death of each child process so that the child process can respond properly.

Chapter 15 Zombie Process

--Detach of child process --If you don't want to use Process.wait to wait for the child process to finish, you have to detach the child process.

--The kernel keeps information about the terminated child process until the parent process uses Process.wait to request that information. --If the parent process does not indefinitely request the exit status of the child process, that information will never be removed from the kernel. --It is a waste of kernel resources to create a child process in a "keep shooting" method and leave the exit status of the child process unattended.

Example:

message = 'Goog Morning'
recipient = '[email protected]'

pid = fork do
  #Create a child process and send the data to the statistic collector
  #The parent process continues the actual message sending process.
  # 
  #As a parent process, I don't want this work to slow down,
  #I don't care if the transmission to the statistic collector fails for some reason.
  StatsCollector.record message, recipient
end

#Ensure that the child process that collects statistics does not become a zombie.
Process.detach(pid)

--A child process that dies without straddling the parent process becomes a zombie process without exception. --If the child process terminates while the parent process is processing (not waiting for the child process), it will definitely become a zombie. --Once the parent process gets the exit status of the zombie process, that information disappears properly so you don't waste any more kernel resources.

Chapter 16 Processes Can Receive Signals

--Process.wait is a blocking call --Process.wait allows the parent process to manage the child process, but the parent process cannot continue processing until the child process terminates.

--Example to supplement SIGCHLD

child_processes = 3
dead_processes = 0
#Spawn 3 child processes
child_processes.times do
  fork do
    #Sleep for 3 seconds each
    sleep 3
  end
end

#After this, the parent process gets busy with heavy calculations,
#I want to detect the termination of a child process.

#Therefore,:Supplement the CHLD signal. By doing this
#You can receive notifications from the kernel when a child process terminates.
trap(:CHLD) do
  #Process the information of the terminated child process.If you get it with wait,
  #You can see which of the spawned child processes has terminated.
  puts Process.wait
  dead_processes += 1
  #Explicitly terminate the parent process when all child processes have terminated.
  exit if dead_processes == child_processes
end

#Heavy calculation
loop do
  (Math.sqrt(rand(44)) ** 8).floor
  sleep 1
end

--Parallel with SIGCHLD --Signal delivery is unreliable. --If another child process terminates while processing a CHLD signal, there is no guarantee that it will be able to capture the next CHLD signal.

--Handle CHLD properly --You need to loop through the call to Process.wait and wait until all notifications that the child process has died are processed.

--Second argument of Process.wait --Correspondence to the situation where you may receive multiple CHLD signals while processing the signal. --You can pass a pid to the first argument, but you can pass a flag to the second argument that tells the kernel not to block if there are no child processes waiting to end.

Process.wait(-1, Process::WNOHANG)

--Signal Guide --Unix signals are asynchronous communication. --When a process receives a signal from the kernel, it does one of the following:

  1. Ignore the signal
  2. Perform specific processing
  3. Do the default process

--The signal is sent by the kernel. --The signal has a source. --Signals are sent from one process to another, and the kernel acts as an intermediary.

--The initial use of signals was to specify how to kill a process.

—— Signals are a great tool and work great in certain situations. ――But keep in mind that supplementing signals is like using global variables.

--The process can receive the signal at any time. --Signal reception is asynchronous. --Whenever a process receives a signal, it moves to a signal handler. --It doesn't matter if it's a busy loop or a long sleep. --When all the processing in the handler is completed, the code returns to the suspended code and the processing continues.

--If you know the pid, you can communicate with any process on the system by signal. --Signals are a very powerful means of communication between processes. --Signal transmission using kill (1) from the terminal is a common sight.

――Speaking of signals in the real world, most of them are used by processes that keep running for a long time, such as servers and daemons. --In that case, the sender of the signal is more likely to be a human than an automated program.

Chapter 17 Processes Can Communicate

Chapter 18 Daemon Process

Chapter 19 Terminal Process

Chapter 20 Conclusion

Recommended Posts

I see, I will read the UNIX process
I read the SHAP paper
[Unix] What is the zombie process / orphan process?
I read the implementation of golang channel
I read the implementation of range (Objects / rangeobject.c)
I read and implemented the Variants of UKR
[Python] I will upload the FTP to the FTP server.
I want to see the file name from DataLoader
For the first time, I learned about Unix (Linux).
I will install Arch Linux for the time being.
I read the Chainer reference (updated from time to time)
Read the OpenCV documentation
I counted the grains
That ... can't you see the process you're running? The reason for
I want to use only the normalization process of SudachiPy