I saw it in "Differences between top-level loops and loops in functions"
loop_toplevel.py
for i in range(10**7):
pass
As a sample, let's profile the processing of the Python VM and see that the processing of STORE_NAME is effective for the execution time.
Check in Ubuntu 18.04 environment.
Obtained from https://github.com/python/cpython
~$ git clone https://github.com/python/cpython.git
install gcc
~$ sudo apt install -y gcc
Build with profiler enabled, optimization disabled
~$ cd cpython
~/cpython$ ./configure --enable-profiling --with-pydebug
~/cpython$ make
Prepare the target script
~/cpython$ vi loop_toplevel.py
~/cpython$ cat loop_toplevel.py
for i in range(10**7):
pass
Simply run the built python and it will profile it for you.
~/cpython$ ./python loop_toplevel.py
The profile result is saved in gmon.out. You can see the summary by gprof ./python
.
~/cpython$ ls -l gmon.out
-rw-rw-r-- 1 vagrant vagrant 1284656 Nov 4 03:16 gmon.out
~/cpython$ gprof ./python|head -n 20
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
17.46 0.22 0.22 20043118 0.00 0.00 read_size_t
9.52 0.34 0.12 2229 0.00 0.00 _PyEval_EvalFrameDefault
7.94 0.44 0.10 10007889 0.00 0.00 insertdict
6.35 0.52 0.08 10026827 0.00 0.00 _PyMem_DebugRawAlloc
6.35 0.60 0.08 10021559 0.00 0.00 _PyMem_DebugCheckAddress
4.76 0.66 0.06 20045319 0.00 0.00 PyGILState_Check
3.97 0.71 0.05 10020541 0.00 0.00 address_in_range
3.97 0.76 0.05 10001148 0.00 0.00 PyLong_FromLong
3.97 0.81 0.05 10046396 0.00 0.00 lookdict_unicode_nodummy
3.97 0.86 0.05 10021024 0.00 0.00 _PyMem_DebugRawFree
3.57 0.91 0.05 10019912 0.00 0.00 _PyObject_Free
3.17 0.95 0.04 10027556 0.00 0.00 write_size_t
3.17 0.99 0.04 10019415 0.00 0.00 _PyMem_DebugFree
2.38 1.02 0.03 10017088 0.00 0.00 _PyDict_CheckConsistency
2.38 1.05 0.03 10008245 0.00 0.00 PyDict_SetItem
Let's visualize the call graph using gprof2dot.
~/cpython$ sudo apt install -y graphviz
~/cpython$ sudo apt install -y python3-pip
~/cpython$ pip3 install --user gprof2dot
~/cpython$ gprof ./python | ~/.local/bin/gprof2dot | dot -Tpng -o output.png
We saw that PyDict_SetItem was called on the order of the number of loops, and less than that was consuming time (total time 51.14%, self time 2.31%, 10008245 calls).
Looking at the Python VM's STORE_NAME implementation, it does call PyDict_SetItem.
cpython/Python/ceval.c
case TARGET(STORE_NAME): {
PyObject *name = GETITEM(names, oparg);
PyObject *v = POP();
PyObject *ns = f->f_locals;
int err;
if (ns == NULL) {
_PyErr_Format(tstate, PyExc_SystemError,
"no locals found when storing %R", name);
Py_DECREF(v);
goto error;
}
if (PyDict_CheckExact(ns))
err = PyDict_SetItem(ns, name, v);
else
err = PyObject_SetItem(ns, name, v);
Py_DECREF(v);
if (err != 0)
goto error;
DISPATCH();
}
Recommended Posts