Look at Mr. Robota's this tweet regarding MPI, and how do each process work together (mainly regarding the compilation of standard output)? I was curious about the buffer control and the point, so I did a rough search.
This time, I haven't written in detail about the environment etc., so please be sure to take the back and use it by yourself without taking it **. In addition, please note that the terms used are quite attached to the texto.
In addition, I tried Linux environment, and there are three types of MPI: Intel, SGI-MPT, and OpenMPI. Also, Fortran is not mentioned at all. For C / C ++.
First of all, about "buffering".
When outputting data from standard output, it is done through stdout for C and std :: cout for C ++, but even if you call printf or std :: ostream :: operator <<, the output will be reflected immediately. It is not always done. This is because calling the OS APIs (write and send) that are actually in charge of output in small pieces is generally disadvantageous in terms of performance, and in the standard library, it is stored in a buffer to some extent and lumped. This is because I try to spit it out all at once. This behavior is called ** buffering **.
There are three modes of buffering:
For C stdio, control is done with setbuf / setvbuf. The default control depends on the file / device to which the standard output is connected, line buffered for TTY / PTY, and fully buffered otherwise. The flush operation is performed with the fflush function.
For C ++ iostream, control is done by whether to set the std :: ios_base :: unitbuf flag to std :: cout (std :: cout.setf or unsetf). If it is set, it is unbuffered, if it is not set, it is fully buffered, and there is no row buffered. The default is full buffered. The flush operation is performed with the I / O manipulator std :: flush or std :: endl.
… And, like this, the control is different for C and C ++, but in the case of mixed C / C ++ code, it is usually a problem if the output order is disturbed, so by default both are synchronized. I will. In other words, even if it is fully buffered on the C ++ side, if the C side is not, it will be pulled to the C side. However, this behavior can be overridden by std :: ios_base :: sync_with_stdio (false).
Roughly speaking, what MPI is is that if you specify the nodes that can be mobilized and the number of processes to be executed, multiple nodes can start the same (or even heterogeneous) program and perform cooperative calculations. It is a library and a group of tools.
Therefore, there are various high-speed networks such as InfiniBand as cooperation between the started programs, but this time I think about the "standard output flow". Therefore, it is enough to consider the three factors shown in the following figure.
** This is an arbitrary term **,
It will be classified as. In the above figure, the front and other nodes are drawn as if they were running on different nodes, but they may be the same.
Intel MPI First is Intel MPI. It looks like the following figure.
The roles and responses are as follows.
After starting, the manager connects to the front via TCP / IP, aggregates the output piped from the worker, and sends it to the front.
SGI MPT Next is SGI MPT.
The roles and responses are as follows.
The division of roles is similar to Intel MPI, but starting the manager from the front requires a daemon called arrayd (even if it is a local node) that comes with SGI MPT.
OpenMPI Finally, OpenMPI.
The roles and responses are as follows.
The big difference from the above two MPIs, as well as the handling of local nodes, is that the interaction between managers and workers becomes PTY.
If you output with an MPI program, we will look at what happens when it is finally aggregated and output to the front.
As organized above, three types of programs, front manager and worker, work together when MPI is executed. And ** worker output is aggregated to the front via the manager **. Therefore, it is necessary to organize buffering for each route. That is,
There are three places.
Intel MPI In the case of Intel MPI, the output of each worker is mixed even in the middle of the line. So it looks like buffering is disabled.
This is because ** the MPI library internally calls setbuf / setvbuf during MPI_Init to put the worker in an unbuffered state **. In other words, the part of worker-> manager is buffering disabled, and the manager-> front, front-> final output destination is spilled as it is without any particular control, so it looks like buffering is disabled as a whole.
Therefore, after MPI_Init, you can enable buffering by calling setbuf / setvbuf and reconfiguring the buffering. In addition, it seems that the flag of std :: cout is not changed in either MPI_Init or MPI: Init, so if it is a pure C ++ application, buffering will be enabled by disabling C, C ++ synchronization. ..
-ordered-output
at the time of mpiexec. Then setbuf / setvbuf would not be needed. However, there was a note in the manual to remember the last line break in the output. Standard errors also seem to be affected. The following is an excerpt from Chapter 2.3.1 of Intel MPI 2019 Developer Reference.-ordered-output Use this option to avoid intermingling of data output from the MPI processes. This option affects both the standard output and the standard error streams. NOTE When using this option, end the last output line of each process with the end-of-line '\n' character. Otherwise the application may stop responding.
SGI MPT In the case of SGI MPT, the output is organized line by line, which is equivalent to line buffered behavior.
The mechanism behind this is a bit complicated.
In other words, the buffering is done by the efforts of the front desk. Conversely, it may be that you do not want the MPI application (worker) to control the buffer without permission.
OpenMPI Like SGI MPT, OpenMPI behaves like row buffered.
This mechanism is very simple, because the output between worker and manager is PTY, and stdout control for it is row buffered by default. Others Manager-> Front, Front-> Final output destination does not seem to control anything special. In other words, OpenMPI itself does not specifically deal with buffering control, it is left to the standard library.
So, I have seen the difference in control in each MPI.
Although there are differences in each MPI, if you want to make sure that buffering works, I think it is better to call setbuf / setvbuf immediately after MPI_Init.
-ordered-output
, so in the case of the three MPIs discussed this time, it seems that there is no need to touch the program side. ..Below, the source and operation log when trying the behavior with Intel MPI are listed for reference.
Operation log
$ cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core)
$ mpirun --version
Intel(R) MPI Library for Linux* OS, Version 2018 Update 1 Build 20171011 (id: 17941)
Copyright (C) 2003-2017, Intel Corporation. All rights reserved.
$ icpc --version
icpc (ICC) 18.0.1 20171018
Copyright (C) 1985-2017 Intel Corporation. All rights reserved.
$ mpiicpc -std=gnu++11 -o test test.cpp
$ mpirun -np 2 ./test
abababababababababbababababababababababa
abababababababababababababababababababab
a
bbabaababababababababababababababababab
a
bababbaababababbaababababababababababab
babaabababababababababababababababababab
$ mpirun -np 2 ./test --nosync
aaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbb
aaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbb
aaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbb
aaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbb
aaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbb
$ mpirun -np 2 ./test --setvbuf
aaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbb
aaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbb
aaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbb
aaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbb
aaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbb
$ mpirun -np 2 ./test --nosync --unitbuf
abababbaabababababababababababababababab
a
bababababababababababababababababababab
babababababababababababababababababababa
ababababbabababababababababababababababa
abababababababababababababababababababab
$ mpiicpc -std=gnu++11 -o test2 test2.cpp
$ mpirun -np 2 ./test2
abababababbaababababababbaababababababab
babaabababbaabababababababababababababab
babababababababababababababababababababa
ababababbaabababababbabaabababababababab
a
bababababababababababababababababababab
$ mpirun -np 2 ./test2 -f
aaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbb
$ mpirun -np 2 ./test2 -l
aaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbb
aaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbb
aaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbb
aaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbb
aaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbb
$
test.cpp
#include <mpi.h>
#include <iostream>
#include <thread>
#include <string>
#include <cstdio>
static char stdoutbuf[8192];
int main(int argc, char **argv) {
MPI::Init(argc,argv);
MPI::COMM_WORLD.Set_errhandler(MPI::ERRORS_THROW_EXCEPTIONS);
int rank = MPI::COMM_WORLD.Get_rank();
for ( int i=1; i<argc; i++ ) {
std::string opt(argv[i]);
if ( opt == "--nosync" ) {
// detach C++-iostream from C-stdio
std::ios_base::sync_with_stdio(false);
}
else if ( opt == "--setvbuf" ) {
// re-setvbuf for C-stdio
std::setvbuf(stdout,stdoutbuf,_IOFBF,sizeof(stdoutbuf));
}
else if ( opt == "--unitbuf" ) {
// disable buffering on C++-iostream
std::cout.setf(std::ios_base::unitbuf);
}
else if ( rank == 0 ) {
std::cerr << "invalid option: " << opt << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
}
char c='a'+rank;
for ( int i=0; i<5; i++ ) {
MPI::COMM_WORLD.Barrier();
for ( int j=0; j<20; j++ ) {
std::cout << c;
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
std::cout << std::endl;
}
MPI::Finalize();
}
test2.cpp
#include <mpi.h>
#include <iostream>
#include <thread>
#include <string>
#include <cstdio>
static char stdoutbuf[8192];
int main(int argc, char **argv) {
MPI::Init(argc,argv);
MPI::COMM_WORLD.Set_errhandler(MPI::ERRORS_THROW_EXCEPTIONS);
int rank = MPI::COMM_WORLD.Get_rank();
if ( argc > 1 ) {
std::string opt(argv[1]);
if ( opt == "-f" ) {
// full buffered
std::setvbuf(stdout,stdoutbuf,_IOFBF,sizeof(stdoutbuf));
}
else if ( opt == "-l" ) {
// line buffered
std::setvbuf(stdout,stdoutbuf,_IOLBF,sizeof(stdoutbuf));
}
}
char c='a'+rank;
for ( int i=0; i<5; i++ ) {
MPI::COMM_WORLD.Barrier();
for ( int j=0; j<20; j++ ) {
std::cout << c;
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
std::cout << '\n';
}
std::cout << std::flush;
MPI::Finalize();
}
Recommended Posts