About the behavior of yield_per of SqlAlchemy

SqlAlchemy yield_per method

Is it find_in_batches in Rails? It's a nice method that processes all the results sequentially without storing all the results in memory when selecting a large amount of data.

If you think that it is a method that handles limits and offsets nicely, it is a method that processes the selected result as a stream without such a thing.

Looking at the SQL log, I thought that there would be other people besides me who were worried that SQL would not come out as expected, so I'll make a note of it.

How to use memo just in case

`yield_per.sample.py`


sess = Session(engine)
for obj in sess.query(Customer).filter_by(ownd_uid = n).yield_per(10):
   hogehoge(obj)

If you write it like this, even if there are a lot of records selected under the filter condition, it will process while selecting 10 records at a time, so it will process all the select results sequentially without putting them in memory.

It's the best.

Oh? Are you using N connections?

If you try show processlist on MySQL, it seems that each connection called by yield_per has one connection. Is that so if you ask?

screen 2016-11-06 12.16.41.png

The first loop that takes 945 seconds is the top loop, but I wonder if it can be broken by connect_timeout .... (I feel like it's going to run out)

If you don't use such a large amount of data, you may not be able to see it, so if you get an error, keep a note of how to use it.

Result: No connection error.

It took 1-2 hours, but it ended without any problems.

Maybe if you keep a long connection, you won't be able to query the show process list? !!

By all means, please check the behavior of what kind of processing is done on the MySQL side!

Recommended Posts

About the behavior of yield_per of SqlAlchemy

About the behavior of enable_backprop of Chainer v2

About the behavior of copy, deepcopy and numpy.copy

About the behavior of Model.get_or_create () of peewee in Python

About the behavior of Queue during parallel processing

About the ease of Python

About the components of Luigi

About the features of Python

Tank game made with python About the behavior of tanks

About the return value of pthread_mutex_init ()

A memo about the behavior of bowtie2 during multiple hits

About the return value of the histogram.

About the upper limit of threads-max

About the size of matplotlib points

About the basics list of Python basics

Check the behavior of destructor in Python

About the virtual environment of python version 3.7

About the arguments of the setup function of PyCaret

About the test

Behavior of multiprocessing.pool.Pool.map

I wanted to be careful about the behavior of Python's default arguments

About the queue

About the accuracy of Archimedean circle calculation method

About the X-axis notation of Matplotlib bar graphs

See the behavior of drunkenness with reinforcement learning

About the processing speed of SVM (SVC) of scikit-learn

The behavior of signal () depends on the compile options

A note about the python version of python virtualenv

About the development contents of machine learning (Example)

[Note] About the role of underscore "_" in Python

Visualize the behavior of the sorting algorithm with matplotlib

About the * (asterisk) argument of python (and itertools.starmap)

The beginning of cif2cell

A memorandum about the warning of the pylint output result

About all of numpy

The meaning of self

About assignment of numpy.ndarray

[python] behavior of argmax

the zen of Python

Think about the next generation of Rack and WSGI

About testing in the implementation of machine learning models

About the inefficiency of data transfer in luigi on-memory

Basic usage of SQLAlchemy

The story of sys.path.append ()

Examine the close processing of Python dataset (SQLAlchemy wrapper)

About the Unfold function

About the service command

About the uncluttered arrangement in the import order of flake8

A story about changing the master name of BlueZ

About variable of chainer

About the confusion matrix

About the Visitor pattern

Personal notes about the integration of vscode and anaconda

A reminder about the implementation of recommendations in Python

Revenge of the Types: Revenge of types

A note on the default behavior of collate_fn in PyTorch

Reuse the behavior of the @property method by using a descriptor [16/100]

Think about the analysis environment (Part 1: Overview) * As of January 2017

About the camera change event of Google Maps Android API

About the garbled Japanese part of pandas-profiling in Jupyter notebook

I tried a little bit of the behavior of the zip function