Is it find_in_batches in Rails? It's a nice method that processes all the results sequentially without storing all the results in memory when selecting a large amount of data.
If you think that it is a method that handles limits and offsets nicely, it is a method that processes the selected result as a stream without such a thing.
Looking at the SQL log, I thought that there would be other people besides me who were worried that SQL would not come out as expected, so I'll make a note of it.
How to use memo just in case
yield_per.sample.py
sess = Session(engine)
for obj in sess.query(Customer).filter_by(ownd_uid = n).yield_per(10):
hogehoge(obj)
If you write it like this, even if there are a lot of records selected under the filter condition, it will process while selecting 10 records at a time, so it will process all the select results sequentially without putting them in memory.
It's the best.
If you try show processlist on MySQL, it seems that each connection called by yield_per has one connection. Is that so if you ask?
The first loop that takes 945 seconds is the top loop, but I wonder if it can be broken by connect_timeout .... (I feel like it's going to run out)
If you don't use such a large amount of data, you may not be able to see it, so if you get an error, keep a note of how to use it.
It took 1-2 hours, but it ended without any problems.
Maybe if you keep a long connection, you won't be able to query the show process list? !!
By all means, please check the behavior of what kind of processing is done on the MySQL side!
Recommended Posts