About all of numpy (2nd)

About all of numpy (2nd)

This article is based on @ shiracamus's comment in Previously written article.

Introduction

numpy's all is a function that returns True if all the elements of the numpy array are True, False otherwise. The documentation is here (https://docs.scipy.org/doc/numpy/reference) /generated/numpy.ndarray.all.html#numpy.ndarray.all). Here we are referring to `ndarray.all ()`. As the documentation says, `np. All () ``` has the same meaning, and there seems to be a function called` `np.alltrue ()` `, but [github source code](https://github.com/numpy/numpy) Looking at /blob/master/numpy/core/fromnumeric.py), these two functions seemed to call ndarray.all () `` after all, so I won't use them this time.

Calculation using numpy is very fast, so basically it is faster to calculate with numpy than to write it directly with python. However, I would like to introduce it because I could overturn it under limited conditions. I think.

Also, numba is a library that performs Just In Time (JIT) compilation to achieve speedup. Therefore, it may take some time to compile when accessing the function for the first time. Specifically, a hint ( @ numba.jit (numba.b1 (numba.b1 [:])) `) takes almost no compilation time, but not (` `@ numba.jit (When `) may take a few seconds. There is no big difference in terms of post-compile execution time.

Method

Compare the following three when numba was not used for the first time.

  1. Access the array elements in order with the for statement
  2. Built-in functions all (Documents here)
  3. numpy all At first glance, 1 and 2 are the same, but you can see from the results that they are different. I would also like to find out the time when using numba as well.

Source code

If you don't use numba, just comment out the `` `@ numba.jit``` part.

test.py


import numpy as np
import numba
import time
import matplotlib.pyplot as plt

#Use built-in functions
@numba.jit(numba.b1(numba.b1[:]))
def builtin(arr):
    return all(arr)

#Describe with a for statement
@numba.jit(numba.b1(numba.b1[:]))
def use_for(arr):
    for element in arr:
        if not element:
            return False
    else:
        return True

#use all of numpy
@numba.jit(numba.b1(numba.b1[:]))
def np_all(arr):
    return arr.all()

#Test the function as an argument
def test(func):
    elapsed_time = []
    for arr_size in range(1000):
        arr = np.ones(arr_size).astype(np.bool)
        start = time.time()
        func(arr)
        end = time.time()
        elapsed_time.append((end - start) * 1e6)

    return elapsed_time[1:]

if __name__ == '__main__':
    plt.plot(test(use_for), 'g', label='for')
    plt.plot(test(builtin), 'r', label='built-in')
    plt.plot(test(np_all), 'b', label='numpy')
    plt.legend()
    plt.xlabel('array size')
    plt.ylabel('elapsed time[us]')
    plt.show()

Execution method

python test.py

result

numba not used

The execution time when the size of the array is 1000 or less is shown in the figure. From this, you can see that numpy is more useful as the size of the array increases. Also, the built-in function is better than the for statement. I think that being fast is a property of Python. If it is 200 or less, I found that the other two methods are faster than numpy. This number of 200 may depend on the environment. there is. figure_1000_.png

use numba

Plot is done except for the elapsed time required for the very first access. Then I got the result as shown in the figure. It seems that the built-in function cannot be accelerated by numba, but the all and for statements of numpy are accelerated. Also, it is interesting that * the built-in functions are slower than when numba was not used *. On the other hand, when written in a for statement, it is faster, and its speed is faster than numpy. numba_1000_.png

If the array is larger, it looks like the following figure. Excluding the built-in functions. The for statement still seems to be faster. numba_100000_.png

Conclusion

numba not used

use numba

I realized that I didn't pay attention to the CPU usage after writing so far, and again

taskset -c 0 python test.py

I got a similar graph when I ran it as, so there was no problem.

Finally

Thanks to @shiracamus.

Recommended Posts

About all of numpy (2nd)
About all of numpy
About numpy
About Numpy broadcast
About cumulative assignment of lists and numpy arrays
About assignment of numpy.ndarray
About MultiIndex of pandas
About variable of chainer
About __all__ in python
Set function of NumPy
About import error of numpy and scipy in anaconda
Sum of multiple numpy arrays (sum)
About max_iter of LogisticRegression () of scikit-learn
About Japanese path of pyminizip
About Japanese support of cometchat
About various encodings of Python 3
About cost calculation of MeCab
About approximate fractions of pi
About the components of Luigi
About HOG output of Scikit-Image
About the features of Python
About data management of anvil-app-server
About Numpy array and asarray
Visualization of matrix created by numpy
About the return value of pthread_mutex_init ()
About the return value of the histogram.
About the basic type of Go
[Memo] Small story of pandas, numpy
About the upper limit of threads-max
About circular crossover of genetic algorithms
About the behavior of yield_per of SqlAlchemy
About import error of PyQt5.QtWidgets (Anaconda)
About the size of matplotlib points
About color halftone processing of images
About the basics list of Python basics