How to put the output of stdout directly into numpy.loadtxt in case you want to analyze the data processed by awk with numpy of python.
For example, suppose you have name, height, and weight data.
input.dat
Yamada 160 50
Tanaka 170 60
Sakana 180 70
Here, suppose that you want to extract only the numerical part and correlate height and weight. You can do that with Python, but it's awkward when the file gets big, so use awk and read the numerical data with numpy.loadtxt. In other words, it looks like this.
$ cat input.dat | awk '{print $2, $3}' > tmp.dat
$ python analysis.py tmp.dat
analysis.py
import sys
import numpy as np
data = np.loadtxt(sys.argv[1])
#After this, I analyzed it messed up
However, it is troublesome to get an intermediate file. I want to make it look like this.
$ python analysis.py input.dat
First, use subprocess to use shell commands within Python. Put the final output in subprocess.PIPE and put it in numpy.loadtxt.
analysis.py
import sys
import subprocess
import numpy as np
p1 = subprocess.Popen(["cat", sys.argv[1]], stdout=subprocess.PIPE)
p2 = subprocess.Popen(["awk", "{print $2, $3}"], stdin=p1.stdout, stdout=subprocess.PIPE)
data = np.loadtxt( p2.stdout )
#After this, I analyzed it messed up
In the above script, I wrote the shell command in Python, but due to the fact that the content of the awk script is different every time.
$ cat input.dat | awk '{print $2, $3}' | python analysis.py
If you want to connect with a pipe like this, use fileinput
.
analysis.py
import numpy as np
import fileinput
data = np.loadtxt(fileinput.input())
#After this, I analyzed it messed up