Use communicate () when receiving output in a Python subprocess

A note for me that I didn't read or forgot even though it was written in the document.

It is recommended to use the subprocess module when running child processes in Python. If you are using p.stdout.read () or p.stderr.read () when receiving standard output or error output for that child process When the amount of data output by the child process is large, it will be ** stuck **. It's a good idea to use p.communicate () instead.

About that matter, in the middle of the following document http://docs.python.jp/2.7/library/subprocess.html

warning Using stdin.write (), stdout.read (), stderr.read () can fill the OS pipe buffer of another pipe and deadlock it. Use communicate () to avoid this.

It is written, but it is a memo because I used it carelessly (and was addicted to the problem).

Below is a simple verification code.

read () was OK when the output data amount of the child process was 10KB, but it was not good when the output data amount of the child process was 100KB. Both were OK using communicate ().

spike.py


import subprocess


def bad_impl(cmd):
    print "start bad_impl %s" % cmd
    p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    print "waiting"
    p.wait()
    stdout_data = p.stdout.read()
    stderr_data = p.stderr.read()
    print "finish: %d %d" % (len(stdout_data), len(stderr_data))
    return p.returncode, stdout_data, stderr_data


def better_impl(cmd):
    print "start better_impl %s" % cmd
    p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    print "waiting"
    stdout_data, stderr_data = p.communicate()
    print "finish: %d %d" % (len(stdout_data), len(stderr_data))
    return p.returncode, stdout_data, stderr_data

command1 = "python -c 'print str(1) * 10000'"           #10KB output
command2 = "python -c 'print str(1) * 100000'"          #100KB output

better_impl(command1)
print "=" * 50
bad_impl(command1)
print "=" * 50
better_impl(command2)
print "=" * 50
bad_impl(command2)            #This fails
% python spike.py
start better_impl python -c 'print str(1) * 10000'
waiting
finish: 10001 0
==================================================
start bad_impl python -c 'print str(1) * 10000'
waiting
finish: 10001 0
==================================================
start better_impl python -c 'print str(1) * 100000'
waiting
finish: 100001 0
==================================================
start bad_impl python -c 'print str(1) * 100000'
waiting
↑ Control does not come back here

Furthermore,

The part of communicate () also has the following description.

Note The received data is buffered in memory. Therefore, you should not use this method if the data returned is large or unrestricted.

If you exceed the available memory limit, this is also useless. In that case, you have to read it out one by one and save it in a file. If you handle such a few GB of data, think again at that time ... (^^;

Recommended Posts

Use communicate () when receiving output in a Python subprocess
When writing a program in Python
Use print in a Python2 lambda expression
Precautions when pickling a function in python
Read the standard output of a subprocess line by line in Python
Output in the form of a python array
Use a custom error page in python / tornado
[Question] What happens when I use% in python?
Use pydantic when reading environment variables in Python
Use dates in Python
Use Valgrind in Python
Japanese output in Python
Use profiler in Python
Use libsixel to output Sixel in Python and output a Matplotlib graph to the terminal.
Output timing is incorrect when standard (error) output is converted to a file in Python
A memorandum when writing experimental code ~ Logging in python
Get standard output in real time with Python subprocess
Things to note when initializing a list in Python
What's in that variable (when running a Python script)
Japanese output when dealing with python in visual studio
How to execute a command using subprocess in Python
A template that I often use when making Discord BOT in Python (memorial note)
Let's use def in python
Take a screenshot in Python
Use let expression in Python
Use Measurement Protocol in Python
Create a function in Python
Create a dictionary in Python
Use callback function in Python
Use parameter store in Python
[Python] Use a string sequence
Use HTTP cache in Python
Use MongoDB ODM in Python
Use list-keyed dict in Python
Use Random Forest in Python
Use regular expressions in Python
Use Spyder in Python IDE
Make a bookmarklet in Python
Attention when os.mkdir in Python
Draw a heart in Python
Read Fortran output in python
[Django] A memorandum when you want to communicate asynchronously [Python3]
How to use the __call__ method in a Python class
Change the standard output destination to a file in Python
Behavior when giving a list with shell = True in subprocess
[python] A note when trying to use numpy with Cython
Use a macro that runs when saving python with vscode
A memo when creating a directed graph using Graphviz in Python
[Selenium] Change log output destination when executing phantomjs in python3
Use Heroku in python to notify Slack when a specific word is muttered on Twitter
[Subprocess] When you want to execute another Python program in Python code
Maybe in a python (original title: Maybe in Python)
Convert to a string while outputting standard output with Python subprocess
Write a binary search in Python
[python] Manage functions in a list
Precautions when using pit in Python
Hit a command in Python (Windows)
Use fabric as is in python (fabric3)
Create a DI Container in Python
Behavior when listing in Python heapq
Use networkx, a library that handles graphs in python (Part 2: Tutorial)