Difference in execution speed depending on how to write Cython function

Purpose

I investigated how the execution speed changes depending on how to write Cython functions.

History

background

Method

# Factor Choices
1 Syntax to use ① Python for statement ② Cython for statement
2 Argument type (1) Arbitrary iterable (2) Numpy array (3) Typed memory view
3 Argument element type specification ① None ② Yes
4 Element access method ① Non-index specification (v in vs) ② Index specification (vs)[i])
5 Presence or absence of check function ① None ② Yes

result

Details

1. Preparation

Take the sum of squares of the following sequence. We have two versions, a Python list and a Numpy array.

python


import numpy as np
vs_list = [1.0,]*10**6
vs_ndarray = np.ones((10**6,), dtype=np.double)

2. Python for statement

The performance of the original Python for loop was as follows [^ 2]. It takes 300-400ms to get the sum of the squares of $ 10 ^ 6 $ real numbers [^ 3]. ** In Python, it seems faster to use the iterator obediently without index access **.

# syntax Argument type Element type specification Element access Check function Execution time[ms]
1 Python for statement Any iterable None v in vs Yes 302
2 Python for statement Any iterable None vs[i] Yes 381

1


def sum_py(vs):
    s = 0
    for v in vs:
        s += v
    return s

python


%timeit square_sum_py(vs_list)

2


def square_sum_py_range(vs):
    s = 0
    for i in range(len(vs)):
        s += vs[i]**2
    return s

python


%timeit square_sum_py_range(vs_list)

3. Numpy broadcast

Not surprisingly, Numpy broadcasts can be dramatically faster. However, Cython is actually used when the broadcast function cannot be used, so we need to consider other methods here. You can see that processing a Numpy array with a Python for statement slows it down.

# syntax Argument type Element type specification Element access Check function Execution time[ms]
3 Numpy broadcast Numpy array Yes None (unnecessary) Yes 17
4 Python for statement Numpy array Yes v in vs Yes 1640
5 Python for statement Numpy array Yes vs[i] Yes 1950

3


def square_sum_np(vs):
    return np.sum(vs**2)

python


%timeit square_sum_np(vs_ndarray)

4


#Function defined above
%timeit square_sum_py(vs_ndarray)

5


#Function defined above
%timeit square_sum_py_range(vs_ndarray)

4. Argument type specification

So, next, let's think about using Cython to speed up the Python for statement. A way to pass a low-level memory-accessible array to a function in Cython is to specify a Numpy array as a direct argument [http://docs.cython.org/en/latest/src/tutorial/numpy] There are .html) and How to specify a typed memory view as an argument. Here, we will start with the more intuitive former method. Looking at the fragmentary information on the net, it seems that only typing is effective in speeding up Cython. However, as you can see in the table below, just specifying the function arguments in a Numpy array isn't much faster than the original Python code (1) (6). At this stage, specifying the element types of the Numpy array does little (7). Of course, it's faster than Python code (4) using Numpy arrays, but that's all there is no reason to use Cython.

# syntax Argument type Element type specification Element access Check function Execution time[ms]
6 Cython for statement Numpy array None v in vs Yes 378
7 Cython for statement Numpy array Yes v in vs Yes 362

6


%%cython
cimport numpy as np
def square_sum_cy_np(np.ndarray vs):
    cdef double v, s = 0.0
    for v in vs:
        s += v**2
    return s

python


%timeit square_sum_cy_np(vs_ndarray)

7


%%cython
cimport numpy as np
def square_sum_cy_np_typed(np.ndarray[np.double_t, ndim=1] vs):
    cdef double v, s = 0.0
    for v in vs:
        s += v**2
    return s

python


%timeit square_sum_cy_np_typed(vs_ndarray)

5. Index access

Then, what to do is to change the access method (for statement writing method) to the array element. Index access without element typing is slower, but index access with element typing is dramatically faster.

# syntax Argument type Element type specification Element access Check function Execution time[ms]
8 Cython for statement Numpy array None vs[i] Yes 1610
9 Cython for statement Numpy array Yes vs[i] Yes 28

8


%%cython
cimport numpy as np
def square_sum_cy_np_range(np.ndarray vs):
    cdef double s = 0.0
    for i in range(vs.shape[0]):
        s += vs[i]**2
    return s

python


%timeit square_sum_cy_np_range(vs_ndarray)

9


%%cython
cimport numpy as np
def square_sum_cy_np_typed_range(np.ndarray[np.double_t, ndim=1] vs):
    cdef double s = 0.0
    for i in range(vs.shape[0]):
        s += vs[i]**2
    return s

python


%timeit square_sum_cy_np_typed_range(vs_ndarray)

6. Omission of check function

The official documentation also describes how to omit the array access check function (10-13), but the difference was small compared to not omitting the check function (6-9). It's true that 10 to 20% is faster, but it's only as effective as the last push.

# syntax Argument type Element type specification Element access Check function Execution time[ms]
10 Cython for statement Numpy array None v in vs None 315
11 Cython for statement Numpy array Yes v in vs None 313
12 Cython for statement Numpy array None vs[i] None 1610
13 Cython for statement Numpy array Yes vs[i] None 25

10


%%cython
cimport numpy as np
from cython import boundscheck, wraparound
def square_sum_cy_np_nocheck(np.ndarray vs):
    cdef double v, s = 0.0
    with boundscheck(False), wraparound(False):
        for v in vs:
            s += v**2
        return s

python


%timeit square_sum_cy_np_nocheck(vs_ndarray)

11


%%cython
cimport numpy as np
from cython import boundscheck, wraparound
def square_sum_cy_np_typed_nocheck(np.ndarray[np.double_t, ndim=1] a):
    cdef double d, s = 0.0
    with boundscheck(False), wraparound(False):
        for d in a:
            s += d**2
    return s

python


%timeit square_sum_cy_np_typed_nocheck(vs_ndarray)

12


%%cython
cimport numpy as np
from cython import boundscheck, wraparound
def square_sum_cy_np_range_nocheck(np.ndarray a):
    cdef double s = 0.0
    with boundscheck(False), wraparound(False):
        for i in range(a.shape[0]):
            s += a[i]**2
    return s

python


%timeit square_sum_cy_np_range_nocheck(vs_ndarray)

13


%%cython
cimport numpy as np
from cython import boundscheck, wraparound
def square_sum_cy_np_typed_range_nocheck(np.ndarray[np.double_t, ndim=1] a):
    cdef double s = 0.0
    with boundscheck(False), wraparound(False):
        for i in range(a.shape[0]):
            s += a[i]**2
    return s

python


%timeit square_sum_cy_np_typed_range_nocheck(vs_ndarray)

Typed memory view

Another way to pass an array with low-level memory access to the Cython function is to specify a typed memory view as an argument. This method is recommended in the official documentation. As you can see from the results below, it is the method of accessing the array elements that works in this case as well.

# syntax Argument type Element type specification Element access Check function Execution time[ms]
14 Cython for statement Typed memory view Yes (necessary) v in vs Yes 519
15 Cython for statement Typed memory view Yes (necessary) vs[i] Yes 26
16 Cython for statement Typed memory view Yes (necessary) v in vs None 516
17 Cython for statement Typed memory view Yes (necessary) vs[i] None 24

14


%%cython
def square_sum_cy_mv(double[:] vs):
    cdef double v, s = 0.0
    for v in vs:
        s += v**2
    return s

python


%timeit square_sum_cy_np_typed_range_nocheck(vs_ndarray)

15


%%cython
def square_sum_cy_mv_range(double[:] vs):
    cdef double s = 0.0
    for i in range(vs.shape[0]):
        s += vs[i]**2
    return s

python


%timeit square_sum_cy_mv_range(vs_ndarray)

16


%%cython
from cython import boundscheck, wraparound
def square_sum_cy_mv_nocheck(double[:] vs):
    cdef double v, s = 0.0
    with boundscheck(False), wraparound(False):
        for v in vs:
            s += v**2
    return s

python


%timeit square_sum_cy_mv_nocheck(vs_ndarray)

17


%%cython
from cython import boundscheck, wraparound
def square_sum_cy_mv_range_nocheck(double[:] vs):
    cdef double s = 0.0
    with boundscheck(False), wraparound(False):
        for i in range(vs.shape[0]):
            s += vs[i]**2
    return s

python


%timeit square_sum_cy_mv_range_nocheck(vs_ndarray)

Consideration

Remarks

python


import numpy as np
%load_ext Cython

python


vs = np.ones((10**3,10**3), dtype=np.double)

python


%%cython
from cython import boundscheck, wraparound
cdef double square_sum(double[:, :] vs):
# def square_sum(double[:, :] vs):Even if it is almost the same in this case
    cdef:
        double s = 0.0
        Py_ssize_t nx = vs.shape[0]
        Py_ssize_t ny = vs.shape[1]
        Py_ssize_t i, j
    with boundscheck(False), wraparound(False):
        for i in range(nx):
            for j in range(ny):
                s += vs[i, j]**2
    return s

python


%timeit square_sum(vs)

[^ 1]: The number of codes is not $ 2 \ times3 \ times2 \ times2 \ times2 = 24 $ because there is a correlation between the choices and (2) the code for reference is added. It is the cause. [^ 2]: There was a maximum variation of ± 20% in the measured value by% timeit, but here we are concerned about the difference of several times to several tens of times, so I am concerned about the difference of several percent. I will not. [^ 3]: CPU uses Intel Celeron.

Recommended Posts

Difference in execution speed depending on how to write Cython function
How to write soberly in pandas
Notes on how to write requirements.txt
Difference in how to write if statement between ruby ​​and python
How to run Cython on OSX Memo
How to write this process in Perl?
How to write Ruby to_s in Python
How to write urlfetch unittest on GAE / P
How to use Python Kivy ④ ~ Execution on Android ~
How to write regular expression patterns in Linux
To write to Error Repoting in Python on GAE
How to write async and await in Vue.js
How to write a named tuple document in 2020
[Go] How to write or call a function
How to Mock a Public function in Pytest
[Linux] Difference in time information depending on the clock ID of the clock_gettime () function
20th Offline Real-time How to Write Problems in Python
How to write string concatenation in multiple lines in Python
How to define Decorator and Decomaker in one function
How to allow nologin users to log in on Linux
Difference in results depending on the argument of multiprocess.Process
How to write custom validations in the Django REST Framework
Differences in sys.path depending on how Python is started (v3.8.2)
How to use VS Code in venv environment on windows
[Python] How to write an if statement in one sentence.
A note on how to load a virtual environment in PyCharm
How to sample from any probability density function in Python
Summary of how to write .proto files used in gRPC
Notes on how to use marshmallow in the schema library
[ROS] How to write Publisher and Subscriber on one node
Covector to think in function
XPath Basics (2) -How to write XPath
How to call a function
How to register on pypi
How to develop in Python
[sh] How to store the command execution result in a variable
How to install OpenCV on Cloud9 and run it in Python
Repeated @ app.callback in Dash How to write Input and State neatly
How to write code to access python dashDB on Bluemix or local
The 15th offline real-time how to write reference problem in Python
How to write offline real time Solve E04 problems in Python
How to deploy a Django app on heroku in just 5 minutes
The 14th offline real-time how to write reference problem in python
File open function in Python3 (difference between open and codecs.open and speed comparison)
How to use the render function defined in .mako (.html) directly in mako
The 18th offline real-time how to write reference problem in Python
[Python] How to do PCA in Python
How to handle session in SQLAlchemy
How to use the zip function
How to install mysql-connector-python on mac
How to use classes in Theano
How to collect images in Python
How to use Dataiku on Windows
Flask reuse How to write html
How to update Spyder in Anaconda
How to use SQLite in Python
Notes on how to use pywinauto
Write AWS Lambda function in Python
How to install graph-tool on macOS
How to install VMware-Tools on Linux
Measure function execution time in Python