Isono ~! I have a csv file that contains millions of records, so let's split it up and process it in parallel!
――I didn't see you doing splitting and paralleling together, so I wrote it as a memorandum, please tell me if there is any nice article --python3.7 --I don't use pandas -pool Use -future Not used
File to read
sample.csv
1,Ah
2,I
3,U
4,e
5,O
I borrowed gen_chunks ()
because there was a person who made it https://stackoverflow.com/a/4957046
Someone has already done most of the work. Thank you internet, thank you ancestors
Click here for the completed code
pool.py
import csv
import time
from multiprocessing import Pool
def read():
f = open("sample.csv", "r")
reader = csv.reader(f)
pool = Pool()
results = []
for data_list in gen_chunks(reader):
results.append(pool.apply_async(do_something, [data_list]))
pool.close()
pool.join()
_ = [r.get() for r in results]
f.close()
def do_something(data_list):
print(f"start {data_list}")
time.sleep(len(data_list))
# hoge
print(f"finish {data_list}")
def gen_chunks(reader, chunksize=2):
"""
Chunk generator. Take a CSV `reader` and yield
`chunksize` sized slices.
"""
chunk = []
for i, line in enumerate(reader):
if i % chunksize == 0 and i > 0:
yield chunk
chunk = []
chunk.append(line)
yield chunk
read()
result
start [['1', 'Ah'], ['2', 'I']]
start [['3', 'U'], ['4', 'e']]
start [['5', 'O']]
finish [['5', 'O']]
finish [['1', 'Ah'], ['2', 'I']]
finish [['3', 'U'], ['4', 'e']]
Since chunksize = 2
, two records are passed todo_something ()
. If it's a big csv, let's make it feel good here.
I'm picking up an error with _ = [r.get () for r in results]
. I should handle the error, but I omit it because it is troublesome
There seems to be a better way to write it, so please let me know if you know
Also, as pointed out on stackoverflow, if the array reset part of gen_chunks ()
is set to del chunk [:]
, the output will be as follows.
result
start [['5', 'O']]
start [['5', 'O']]
start [['5', 'O']]
finish [['5', 'O']]
finish [['5', 'O']]
finish [['5', 'O']]
I didn't read the comments properly so I witnessed a disappointing result that's sad
Recommended Posts