About all of numpy

Please refer to the new article created in ** here ** with reference to @ shiracamus's comment.

Introduction

All in numpy is a function that returns True if all the elements in the numpy array are True, False otherwise. The documentation is here (https://docs.scipy.org/doc/numpy/reference) /generated/numpy.ndarray.all.html#numpy.ndarray.all).

Calculation using numpy is very fast, so basically it is faster to calculate with numpy than to write it directly with python, but I really wanted to speed up that part and tried various things, limited conditions If so, I was able to overturn it, so I would like to introduce it.

Method

The method is to access all the array elements with the for statement and calculate in order with and. Compare this with all of numpy. I would also like to find out the time when using numba.

Source code

import numpy as np
import time
import matplotlib.pyplot as plt
import sys

#use all
def func1(arr):
    return arr.all()

#Use and with for
def func2(arr):
    tf = True
    for i in range(arr.size):
        tf = tf and arr[i]
    else:
        return tf

if __name__ == '__main__':
    if len(sys.argv) == 3:
        testsize, arr_size = map(int, sys.argv[1:])
    else:
        testsize = 10
        arr_size = 10
    #Number of tests,Array size
    print(testsize, arr_size)

    elapsed_time = []
    for i in range(testsize):
        #Array of True and False
        arr = np.random.randint(2, size=arr_size).astype(np.bool)
        start = time.time()

        func1(arr)

        end = time.time()
        elapsed_time.append((end - start) * 1e6)

    plt.plot(elapsed_time[1:], 'b', label='numpy all')

    elapsed_time = []
    for i in range(testsize):
        arr = np.random.randint(2, size=arr_size).astype(np.bool)
        start = time.time()

        func2(arr)

        end = time.time()
        elapsed_time.append((end - start) * 1e6)

    plt.plot(elapsed_time[1:], 'r', label='for')
    plt.xlabel('test size')
    plt.ylabel('elapsed time[us]')
    plt.legend()
    plt.show()

result

numba not used

Assuming that the size of the array is 10 and the number of tests is 10 times, the result is as shown in the following figure. It is faster to do and using the for statement.

If you set the size of the array to 200 and the number of tests to 10 times, the result will be as shown in the following figure. All is faster.

The for statement becomes slower as the size of the array increases. You can see how it looks in the following figure. It is unknown what this pulse-like appearance is. From here, I think that it depends on the environment, but if the array size is 100 or less, it is written as it is in python Turned out to be faster.

use numba

Since numba compiles Just In Time (JIT), it takes a long time to access the function at the very beginning, so it plots except for the elapsed time required for the first access. I got the result. It seems that the second access also takes time. It can be said that there is no difference in execution time. If the array is larger, it will look like the following figure. numpy is faster.

Conclusion

I have pasted a lot of graphs, but I would like to say the following two things.

** If the size of the array is small, it is faster to use the for statement if numba is not used **
** numba contributes to speeding up **

Finally

What was the pulse that appeared in that graph?