A note for me that I didn't read or forgot even though it was written in the document.
It is recommended to use the subprocess module when running child processes in Python.
If you are using p.stdout.read ()
or p.stderr.read ()
when receiving standard output or error output for that child process
When the amount of data output by the child process is large, it will be ** stuck **. It's a good idea to use p.communicate ()
instead.
About that matter, in the middle of the following document http://docs.python.jp/2.7/library/subprocess.html
warning Using stdin.write (), stdout.read (), stderr.read () can fill the OS pipe buffer of another pipe and deadlock it. Use communicate () to avoid this.
It is written, but it is a memo because I used it carelessly (and was addicted to the problem).
Below is a simple verification code.
read ()
was OK when the output data amount of the child process was 10KB, but it was not good when the output data amount of the child process was 100KB.
Both were OK using communicate ()
.
spike.py
import subprocess
def bad_impl(cmd):
print "start bad_impl %s" % cmd
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print "waiting"
p.wait()
stdout_data = p.stdout.read()
stderr_data = p.stderr.read()
print "finish: %d %d" % (len(stdout_data), len(stderr_data))
return p.returncode, stdout_data, stderr_data
def better_impl(cmd):
print "start better_impl %s" % cmd
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print "waiting"
stdout_data, stderr_data = p.communicate()
print "finish: %d %d" % (len(stdout_data), len(stderr_data))
return p.returncode, stdout_data, stderr_data
command1 = "python -c 'print str(1) * 10000'" #10KB output
command2 = "python -c 'print str(1) * 100000'" #100KB output
better_impl(command1)
print "=" * 50
bad_impl(command1)
print "=" * 50
better_impl(command2)
print "=" * 50
bad_impl(command2) #This fails
% python spike.py
start better_impl python -c 'print str(1) * 10000'
waiting
finish: 10001 0
==================================================
start bad_impl python -c 'print str(1) * 10000'
waiting
finish: 10001 0
==================================================
start better_impl python -c 'print str(1) * 100000'
waiting
finish: 100001 0
==================================================
start bad_impl python -c 'print str(1) * 100000'
waiting
↑ Control does not come back here
The part of communicate ()
also has the following description.
Note The received data is buffered in memory. Therefore, you should not use this method if the data returned is large or unrestricted.
If you exceed the available memory limit, this is also useless. In that case, you have to read it out one by one and save it in a file. If you handle such a few GB of data, think again at that time ... (^^;
Recommended Posts