This article provides an introductory knowledge of parallel processing. It also mentions the new parallel execution unit Ractor being developed in Ruby.
First, I will summarize the terms that are often confused when discussing this topic.
In ** parallel processing **, multiple processes run at the same time at a certain moment. ** Concurrent processing ** processes multiple processes in order by time division. Unlike parallel processing, only one process runs at the same time at a given moment.
If the timing at which a plurality of processes are executed is shown in chronological order, the image shown below is obtained. (Processing is executed only in the part with the blue line)
This article deals with the behavior of parallel processing, but be aware that even if you write code for parallel processing, it may eventually behave like parallel processing. (For example, a single-core CPU cannot run two or more processes in parallel.) The OS and VM are scheduling nicely around here.
In general, there are two main methods for achieving parallel processing: ** multi-process ** and ** multi-thread **. Multi-process is a method of creating multiple processes and having each process execute one process at a time. Multithreading is a method of creating multiple threads in one process and having each thread execute one process at a time.
In the case of multi-process, the memory space is separated in each process. Therefore, it is basically impossible to pass variables between processes. It is also highly secure, as it prevents unintended memory-based interactions between processes. The disadvantage is that each process has a memory space, so the total memory usage tends to increase. (However, in linux, the memory between processes is shared as much as possible by the mechanism called Copy on write.)
In the case of multithreading, one process has multiple threads, so the memory space is shared between the threads. Therefore, memory usage can be suppressed, and depending on the implementation, thread creation and switching is lighter than process creation and switching. However, since threads can affect each other via memory, bugs such as data races tend to occur. In general, multithreaded programming has many things to consider and is difficult to implement correctly.
The unit in which one process is executed in parallel processing is called ** parallel execution unit **. In the case of multi-process, the parallel execution unit is a process, and in the case of multi-thread, it is a thread.
There are two main methods of implementing thread processing: ** native threads ** and ** green threads **. Native thread is a method to realize multi-thread processing by using the OS implementation as it is. Since thread scheduling (deciding which thread to execute processing now) is left to the OS, the implementation of the processing system becomes simple. On the other hand, there is also the disadvantage that the processing of thread creation and switching (so-called context switching) is heavy. (By the way, native threads are, to be exact, a concept that combines kernel threads and lightweight processes, but details are omitted. I feel that native threads and kernel threads are often mixed.)
Green threads are threads originally implemented in a language processing virtual machine (for example, yarv of cruby, jvm of java, etc.), and are a method of realizing multithread processing. Golang's goroutine is also a type of green thread, and its lightness of operation is too famous. In cruby, it was implemented by green threads before 1.9, but now it has been changed to use native threads. Green threads are also called user threads.
As an example, the implementation of parallel processing in Ruby is shown. In Ruby, you can easily describe parallel processing by using the gem Parallel.
The multi-process code looks like this:
multi_process.rb
require 'parallel'
Parallel.each(1..10, in_processes: 10) do |i|
sleep 10
puts i
end
If you run this code and look at the process list, it looks like this: You can see that there is one main process and 10 child processes.
$ ps aux | grep ruby
PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND PRI STIME UTIME
79050 9.7 0.1 4355568 14056 s005 S+ 2:39PM 0:00.28 ruby mp.rb
79072 0.0 0.0 4334968 1228 s005 S+ 2:39PM 0:00.00 ruby mp.rb
79071 0.0 0.0 4334968 1220 s005 S+ 2:39PM 0:00.00 ruby mp.rb
79070 0.0 0.0 4334968 1244 s005 S+ 2:39PM 0:00.00 ruby mp.rb
79069 0.0 0.0 4334968 1244 s005 S+ 2:39PM 0:00.00 ruby mp.rb
79068 0.0 0.0 4334968 1172 s005 S+ 2:39PM 0:00.00 ruby mp.rb
79067 0.0 0.0 4334968 1180 s005 S+ 2:39PM 0:00.00 ruby mp.rb
79066 0.0 0.0 4334968 1208 s005 S+ 2:39PM 0:00.00 ruby mp.rb
79065 0.0 0.0 4334968 1252 s005 S+ 2:39PM 0:00.00 ruby mp.rb
79064 0.0 0.0 4334968 1168 s005 S+ 2:39PM 0:00.00 ruby mp.rb
79063 0.0 0.0 4334968 1168 s005 S+ 2:39PM 0:00.00 ruby mp.rb
The multithreaded code looks like this:
multi_threads.rb
require 'parallel'
Parallel.each(1..10, in_threads: 10) do |i|
sleep 10
puts i
end
Look at the thread list here as well.
If you add -L
to the ps
command, the thread will appear like a process.
Without -L
, there is only one process, but with -L
, 11 lines are displayed.
In addition, the NLWP
column shows the number of threads in the process, and since this is 11 (main thread x1 + worker thread x10), it can be seen that multithread processing is used.
$ ps aux | grep mt.rb
4419 1.0 0.6 850176 12384 pts/1 Sl+ 15:41 0:00 ruby mt.rb
$ ps aux -L | grep mt.rb
PID LWP %CPU NLWP %MEM VSZ RSS TTY STAT START TIME COMMAND
4419 4419 6.0 11 0.6 850176 12384 pts/1 Sl+ 15:41 0:00 ruby mt.rb
4419 4453 0.0 11 0.6 850176 12384 pts/1 Sl+ 15:41 0:00 ruby mt.rb
4419 4454 0.0 11 0.6 850176 12384 pts/1 Sl+ 15:41 0:00 ruby mt.rb
4419 4455 0.0 11 0.6 850176 12384 pts/1 Sl+ 15:41 0:00 ruby mt.rb
4419 4456 0.0 11 0.6 850176 12384 pts/1 Sl+ 15:41 0:00 ruby mt.rb
4419 4457 0.0 11 0.6 850176 12384 pts/1 Sl+ 15:41 0:00 ruby mt.rb
4419 4458 0.0 11 0.6 850176 12384 pts/1 Sl+ 15:41 0:00 ruby mt.rb
4419 4460 0.0 11 0.6 850176 12384 pts/1 Sl+ 15:41 0:00 ruby mt.rb
4419 4461 0.0 11 0.6 850176 12384 pts/1 Sl+ 15:41 0:00 ruby mt.rb
4419 4462 0.0 11 0.6 850176 12384 pts/1 Sl+ 15:41 0:00 ruby mt.rb
4419 4463 0.0 11 0.6 850176 12384 pts/1 Sl+ 15:41 0:00 ruby mt.rb
In multithreaded processing, various problems may occur because the processes are executed in parallel with a plurality of threads sharing memory. One of the main issues is ** data racing **.
Data races can occur with code like the one below. This code attempts to find the sum of integers from 1 to 10, but due to data racing issues, it may not be possible to find the sum correctly.
require 'parallel'
sum = 0;
Parallel.each(1..10, in_threads: 10) do |i|
add = sum + i
sum = add
end
puts sum
In this code, each thread shares the variable sum
, and each thread reads and writes sum at the same time.
As a result, the content written in one thread may be overwritten by another thread.
Therefore, there is a problem that the above code may not be able to calculate the sum normally.
A common way to solve data racing problems is to take exclusive locks between threads.
require 'parallel'
sum = 0;
m = Mutex.new
Parallel.each(1..10, in_threads: 10) do |i|
m.lock
add = sum + i
sum = add
m.unlock
end
puts sum
As a result, only one thread is executed at a time while the lock is being held, and the data race is eliminated.
Code that properly considers these issues and works well in multithreading is called ** threadsafe **.
** GIL ** is often talked about in multithreaded processing in lightweight languages (ruby, python, etc.). By the way, in Ruby, it is called GVL (Giant VM Lock).
The GIL prevents multiple threads from being executed at the same time by performing exclusive control between threads. In other words, only one thread can be executed at the same time in one interpreter and VM. Reasons and benefits of this need include:
Thanks to GIL, the multithreaded programming Ruby code I just illustrated works fine even without Mutex. It can be said that this behavior is no different from Ruby's basic idea of making programming easier.
However, the fact that only one thread can be executed at a time means that the original parallel processing is impossible. This is why it is often mentioned that Ruby and Python are not suitable for parallel computing.
Exceptionally, when waiting for I / O, the thread releases the GIL, so multiple threads can execute processing at the same time. For this reason, in processing with a lot of I / O waiting (web server, etc.), multithreading is practically used even in a processing system with GIL.
Since HTTP servers usually need to process each request at the same time, parallel processing is often implemented. Typical HTTP servers in Ruby are ** unicorn ** and ** puma **, the former is a multi-process implementation and the latter is a multi-threaded implementation.
Performance of unicorn and puma is compared in this blog.
The conclusions of this blog are as follows:
This is a convincing result even considering the above mechanism.
So far, we have explained how to realize parallel processing, and showed the implementation in Ruby and its performance. Multithreaded processing in Ruby has a problem that it cannot achieve its original performance due to GVL. Ractor (formerly Guild) is a new Ruby parallel processing mechanism that was created to solve this problem.
Ractor can achieve true multithreaded performance while retaining the advantage of making traditional GVL multithreaded programming easier to handle.
I will explain the mechanism.
Data racing occurs because multiple threads can read and write to one variable because the threads share memory. The way to solve this is
In Ractor, three methods were adopted. This new parallel execution unit is called Ractor. A Ruby process has one or more Ractors, and one Ractor has one or more threads. Since each Ractor operates in a separate memory space, there is no problem with sharing memory as in conventional threads.
Source: https://www.slideshare.net/KoichiSasada/guild-prototype
In addition, Ruby code before the introduction of Ractor can maintain backward compatibility by running it within one Ractor.
Since Ractors do not share memory, you may find it cumbersome to pass information.
To solve this, there is also a function called channel
that realizes communication between Ractors.
Objects you want to share can only be passed via channel
.
Objects are classified into ** sharable objects ** and ** non-sharable objects **.
A sharable object is an object, such as a read-only constant, that cannot cause data races when shared between Factors. Shareable objects can be freely shared through channel.
Non-shareable objects refer to general mutable objects. Passing this object through the channel causes deep copy or move semantics. In the case of deep copy, copy processing cost and memory usage increase, but it is as safe and easy to understand as multi-process. In the case of move semantics, ownership of the object is transferred to another Factor. Therefore, the original Ractor cannot refer to the object, but unlike deep copy, the processing cost and memory usage do not increase as much as copying.
Summary:
By doing so, Ractor realizes easy multi-thread programming while maintaining thread safety.
Ractor is a parallel execution unit located between processes and threads. By properly selecting the information that the developer wants to share between Ractors, parallel processing can be realized without increasing the RAM usage as much as multi-process and without the performance degradation due to GIL unlike multi-threading.
Ractor is getting a lot of attention as a new feature in Ruby 3. It seems that Ractor itself is still under development, and it will be a little while before it reaches the reach of general Ruby users. In the future, it is expected that Ruby's multithreaded library will be reimplemented in Ractor. It may be near the time when HTTP servers that replace Puma will become mainstream.
This is a sentence summarized for study. I would appreciate it if you could point out any mistakes!
Recommended Posts