I learned the behavior of subprocess.Popen
, so don't forget it.
When dealing with huge files in python, iterators are used very often to do sequential processing so that everything is not in memory.
However, the sorted ()
function cannot be used for large files. This is because iterators are converted to lists before processing.
If you want to sort a huge file, use sort on the Unix system side.
import subprocess
def sorted_file_generator(filename):
proc = subprocess.Popen(['sort', filename], stdout=subprocess.PIPE)
while True: #Receive line by line using while
line = proc.stdout.readline()
if line:
yield line.decode('utf-8').strip() #The return value is a bytecode, so decode it
else:
break
This example is python3. In case of 2, the handling of bytes should be different
Recommended Posts