The story that performance could be improved just by changing the dtype of numpy

Introduction

I had the opportunity to work on improving the calculation efficiency of numpy. In that, I learned the importance of dtype.

Of course, I hope it will be useful for someone as a personal memorandum. What I wrote is this article.

[Addition] numpy is a numpy.array created by applying the .values method to the dataframed version of pandas read_csv.

Performance before and after

When I write it,

Perform multiple matrix calculations of size (10,000,400) @ (4,550,000)
Take the intersection of them

I had to do the process, but just by changing the dtype of numpy I was able to improve it as follows. (How terrible it was at first ...)

item	before	after
processing time	70 minutes	10 minutes
Memory used	Over 100GB	Over 30GB

Memory is from the value of "Memory" in Mac Activity Monitor. (It was a level where the jupyter kernel fell, but now I can survive with a margin)

Below, I will write what kind of changes I made.

What you did

About speeding up

** Set dtype of numpy.array to float32 or float64 **

It seems that BLAS is used for matrix calculation of numpy. So, it seems that this guy will do a good job if it is the above data type. (See the article at the top of the reference link)

In my case, it was int at first, but with .astype (np.float32) The processing time was reduced from 70 minutes to 10 minutes just by changing the type to float32. !!

[Addition] After sleeping overnight and rereading, I thought that the explanation of the situation was insufficient, so I will supplement it a little.

Originally one hot encoded with pd.get_dummies and then with .values I took out numpy.ndarray and calculated it.

With this method, the data type will be uint8, but this will be changed to float32. I was able to achieve high speed by changing it.

The code looks like this.

pd.get_dummies([Pandas Series]).astype(np.float32).values

About memory saving

** Set the matrix value to a bool value` if possible **

The above-mentioned floatization made the process explosive, but the memory was consumed. I was resisting by often deling unnecessary objects, but it did not lead to a big improvement.

However, by re-holding the calculation result in bool, I was able to save a lot of memory. (It seems that the memory reserved in advance is different between bool type and int, float type)

[Supplement] In python, the following relationship holds between 1/0 and True / False.

So, a matrix that can be represented by 1/0 (for example, a one hot encoding matrix) is It can also be expressed as a bool value.

And the matching technique

As mentioned above, we were able to greatly improve efficiency just by combining the two points. In the end, the process flow was like this.

Perform matrix calculation with float32 type ・ ʻArrayA = float32 matrix @ float32 matrix`
Convert the obtained result to bool type ・ Just do ʻarrayA = (arrayA> = 1)`
Take the intersection of them ・result = arrayA | arrayB | arrayCLike

Impressions

The float seems to be usable in various cases, but is the bool limited? I don't think it can be applied unless the result can be represented by 0/1.

Other

--I also saw an article saying that if you write by specifying the numpy type in Cpython, it will be super fast. However, I didn't actually try it this time. If I have the opportunity, I would like to give it a try. -Intel MKL seems to have better performance, but the execution environment is Intel CPU Not always, so I ran it with OpenBLAS.

Reference link

https://stackoverflow.com/questions/19839539/how-to-get-faster-code-than-numpy-dot-for-matrix-multiplication
https://www.benjaminjohnston.com.au/matmul
https://stackoverflow.com/questions/18743397/python-numpy-np-int32-slower-than-np-float64
https://insilico-notebook.com/python-blas-performance/