How to make Python faster for beginners [numpy]

Actually, I wanted to put numbers in 〇〇 and make it look like a business book, but I made it 〇〇 because I didn't decide the number properly and I will add it from time to time: innocent :.

Caution

This is for beginners who are just starting out with Python, For those who have recently learned about numpy or scipy. So if you're familiar with Python, you'd rather get some advice. I would be grateful if you could tell me another case or a better way: smiley :.

Also, the execution environment was ** Python 3.5.3 **, so please be careful especially if you are using ** Python 2 series **. (Because the return value of map or filter is different)

Overview

With the recent machine learning boom, many people may have started learning Python. You may notice the existence of libraries such as numpy, especially when dealing with simple numerical processing and actual data.

However, I realize that I handle data of a certain size, but it may take time to execute if I do not devise a way of writing (I personally feel that it is a statically typed language before programming for the first time or Python. I think it's easy to get up if you do it).

Especially when I'm learning, I want to try different things, so if it takes time to execute each time, I can't do it. I hate programming: angry :.

So here, by using ** numpy ** etc., in the case where it seems that it can be speeded up "relatively easily" I will introduce about it.

Rough policy

I'm personally careful. If it's slow due to writing problems, it's likely that you're stuck with one of the following:

The following example is unnecessary for those who follow the following parts.

Concrete example

Preparation

The following libraries are imported in advance.

import numpy as np
import pandas as pd
import scipy as sp

Case 1: for I want to store the result of the statement in a list

Sample.py


def func1(n):
    a = []
    for i in range(n):
        a.append(i)
    return a

def func2(n):
    a = [0 for i in range(n)]  #A list of length n initialized with 0
    for i in range(n):
        a[i] = i
    return a

def func3(n):
    a = [i for i in range(n)]  #First initialized by comprehension
    return a

def func4(n):
    return [i for i in range(n)]  #Define directly and return

%time a = func1(10000000)
%time b = func2(10000000)
%time c = func3(10000000)
%time d = func4(10000000)

result


CPU times: user 660 ms, sys: 100 ms, total: 760 ms
Wall time: 762 ms
CPU times: user 690 ms, sys: 60 ms, total: 750 ms
Wall time: 760 ms
CPU times: user 290 ms, sys: 90 ms, total: 380 ms
Wall time: 388 ms
CPU times: user 320 ms, sys: 90 ms, total: 410 ms
Wall time: 413 ms

If you know the length of the list to be returned in advance, use comprehension It will be faster. In fact, this alone halves the execution time. It's a good idea to be aware of this, especially when turning a for statement on a long list.

Case 2: I want to perform four arithmetic operations on all elements in a vector with the same value.

Here, it is assumed that the following vectors are defined in advance.

a = np.array([i for i in range(10000000)])

Consider a function that doubles and returns all the elements in a vector for this vector.

Sample.py


def func1(x):
    y = x.copy()
    for i in range(len(y)):
        y[i] *= 2
    return y

def func2(a):
    return a * 2

%time b = func1(a)
%time c = func2(a)

result


CPU times: user 2.33 s, sys: 0 ns, total: 2.33 s
Wall time: 2.33 s
CPU times: user 10 ms, sys: 10 ms, total: 20 ms
Wall time: 13 ms

In this way, numpy can perform four arithmetic operations for each vector, so for Be careful not to circulate.

Case 4: I want to extract only some elements with a vector

Use the same vector as above. For example, suppose you want to fetch only multiple elements of 3 from the above vector. Then you might think, "I have no choice but to use the if statement in the for statement!" You can also write as follows.

Sample.py


def func1(a):
    ans  = []
    for i in range(len(a)):
        if a[i] % 3 == 0:
            ans.append(a[i])
    return np.array(ans)

def func2(a):
    return a[a % 3 == 0]

%time b = func1(a)
%time c = func2(a)

result


CPU times: user 3.44 s, sys: 10 ms, total: 3.45 s
Wall time: 3.45 s
CPU times: user 120 ms, sys: 10 ms, total: 130 ms
Wall time: 131 ms

Postscript

If you want to retrieve from a list instead of a vector, you can use the ** filter ** function. If you can't or don't want to use ** numpy **, consider this.

You can think of lambda x: y in the sample as an anonymous function that takes x as an argument and returns y.

Sample.py


x = [i for i in range(10000000)]
%time y = list(filter(lambda x: x % 3 == 0, x))

result


CPU times: user 1.67 s, sys: 10 ms, total: 1.68 s
Wall time: 1.68 s

It's slower than using ** numpy **, but faster than appending with a for statement!

Case 5: I want to apply a function to each element of a vector

Now consider applying a function to each element of the list. This section introduces the ** map ** function. This is a function that returns the result of applying the specified function to each element in the list (map object in Python3).

Also, the func below is a function that returns $ x ^ 2 + 2x + 1 $.

Sample.py


a = np.array([i for i in range(10000000)])
def func(x):
    return x**2 + 2*x + 1

def func1(a):
    return np.array([func(i) for i in a])

def func2(a):
    return np.array(list(map(func, a.tolist())))

%time b = func1(a)
%time c = func2(a)
%time d = a**2 + 2*a + 1

result


CPU times: user 5.14 s, sys: 90 ms, total: 5.23 s
Wall time: 5.23 s
CPU times: user 4.95 s, sys: 170 ms, total: 5.12 s
Wall time: 5.11 s
CPU times: user 20 ms, sys: 30 ms, total: 50 ms
Wall time: 51.2 ms

I'd like to introduce you to the map function, but it didn't change that much from the comprehension: cry :. If you read this far, you may have noticed in the middle, but in the case of the above example, it was a simple function, so it is overwhelmingly faster to perform vector operation directly!

Case 6: I want to convert each element (numerical value) of a matrix to an arbitrary score (discrete value)

So far, we have dealt with one-dimensional arrays (vectors). In the example below, I would like to deal with a two-dimensional array (matrix).

In the following cases, it is assumed that you want to convert each numerical value into a score by preprocessing such as machine learning. First, define the following matrix.

a = np.array([[i % 100 for i in range(1000)] for j in range(10000)])

Next, prepare a list to convert to a score. In the list below, 0 if the original number is less than 20, 1 if it is 20 or more and less than 50, 4 if it is 90 or more. Suppose that you want to convert the numbers in the matrix.

scores = [20, 50, 70, 90]

First of all, I would like to empty my head and implement it obediently.

Sample.py


def func1(x):
    y = np.zeros(x.shape)
    for s in scores:
        for i in range(x.shape[0]):
            for j in range(x.shape[1]):
                if x[i, j] >= s:
                    y[i, j] += 1
    return y

%time b = func1(a)

The result is a nice triple loop: innocent :. (Deep loops are not only slower, but also harder to read and follow loop variables. Don't make too many deep loops for humans)

The contents of the function are incremented by 1 for each element in the matrix if it is greater than the specified score.

result1


CPU times: user 14 s, sys: 10 ms, total: 14 s
Wall time: 14 s

As expected, the execution time also exceeded ** 10 seconds **: cry :.

Next, I will introduce a function that has been devised.

Sample2.py


def func2(x):
    y = np.zeros(x.shape)
    for s in scores:
        y += (x >= s)
    return y

%time c = func2(a)

Here's what we're doing:

As mentioned above, the code is short but contains various elements. However, the amount that the for statement is no longer turned tightly ** 100 times or more ** Faster: smile :.

result


CPU times: user 90 ms, sys: 20 ms, total: 110 ms
Wall time: 111 ms

Postscript (2017/08/30)

At this point, you may feel like "I want to erase all ** for ** statements before they are born: angry:". So I wrote it as a trial.

Sample3.py


def func3(x):
    len_score = len(scores)
    y = x * np.array([[np.ones(len_score)]]).T
    s = np.array(scores).reshape(len_score, 1, 1)
    z = (y >= s)
    return z.sum(axis=0)

result


CPU times: user 200 ms, sys: 30 ms, total: 230 ms
Wall time: 235 ms

... late: cry: (maybe because of bad writing) This is slow, requires a lot of memory (because all of it is expanded first), and above all, it becomes difficult to understand, so I found that it is not a good idea to delete the for statement by force.

Case 7: Existence check on list element (Added 2018/04/20)

I remembered it when I saw a recent article, so I made a note.

In Python you can use ʻin` to see if an element is in the list.

But if you apply this to a list, it's $ O (n) $ for a list length of $ n $, so if you make a mistake, an accident will occur.

If you want to check the existence repeatedly, it is better to replace it with set etc. as shown below.

JupyterNotebook(GoogleColaboratory)Confirmed in


L = 100000
x = list(range(L))

def sample1(list_tmp):
    j = 0
    for i in list_tmp:
        if i in list_tmp:
            j += 1
    print("sample1 j: ", j)


def sample2(list_tmp):
    j = 0
    set_tmp = set(list_tmp)  #Convert to set
    for i in list_tmp:
        if i in set_tmp:     #Check if it is in set
            j += 1
    print("sample2 j: ", j)
    
%time sample1(x)
print("----------------------------------------")
%time sample2(x)

result


sample1 j:  100000
CPU times: user 1min 7s, sys: 16 ms, total: 1min 7s
Wall time: 1min 7s
----------------------------------------
sample2 j:  100000
CPU times: user 8 ms, sys: 6 ms, total: 14 ms
Wall time: 14 ms

Extra 1 "I still want to use the for statement"

I said above that I shouldn't use that much for statement, Even so, I think there are situations where you have to use it, or it is easier to understand.

In that case, reopen it and use * numba *. * numba * is a little compiler.

"Well, does the compiler specify all variables? Do I have to type a compile command?"

You might think, but don't worry. Just add one line (two lines if you include ʻimport`).

Let's see an actual usage example.


import numba

def sample1(n):
    ans = 0
    for i in range(n):
        ans += i
    return ans

@numba.jit
def sample2(n):
    ans = 0
    for i in range(n):
        ans += i
    return ans

@numba.jit('i8(i8)', nopython=True)
def sample3(n):
    ans = 0
    for i in range(n):
        ans += i
    return ans

%time a = sample1(100000000)  #If you do nothing
%time b = sample2(100000000)  #When using jit
%time c = sample3(100000000)  # jit(Type specification)When using

From top to bottom, "I didn't do anything", "I used numba", "I used numba (type specification)" It is a function. Inside the function is a function that adds and returns 0 to $ n -1 $.

For type specification, refer to Python acceleration Numba introduction 2 --tkm2261's blog.

The execution time is as follows. If you do nothing, it will take 5 seconds, but if you use "numba (type specification)", it will be about 5.5 microseconds. It's just a different digit (in this example, it's ** about 940,000 times faster **: innocent :).

CPU times: user 5.16 s, sys: 0 ns, total: 5.16 s
Wall time: 5.16 s
CPU times: user 30 ms, sys: 0 ns, total: 30 ms
Wall time: 25.9 ms
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 5.48 µs

in conclusion

I feel like I wrote a lot, but in the above case, I feel that it ended with "Don't use for statement". In the future, I would like to put together various things such as ** scipy ** and ** pandas **.

Recommended Posts

How to make Python faster for beginners [numpy]
How to make Spigot plugin (for Java beginners)
~ Tips for beginners to Python ③ ~
[For beginners] How to use say command in python!
[For beginners] How to study Python3 data analysis exam
Python # How to check type and type for super beginners
How to learn TensorFlow for liberal arts and Python beginners
How to make a Python package (written for an intern)
How to convert Python # type for Python super beginners: int, float
Tips for Python beginners to use the Scikit-image example for themselves 7 How to make a module
[Python] How to make a class iterable
[Python] Organizing how to use for statements
How to use "deque" for Python data
[For beginners] How to register a library created in Python in PyPI
Memo # 4 for Python beginners to read "Detailed Python Grammar"
How to make a dialogue system dedicated to beginners
How to install Python
Python for super beginners Python for super beginners # Easy to get angry
Memo # 3 for Python beginners to read "Detailed Python Grammar"
How to install python
Memo # 1 for Python beginners to read "Detailed Python Grammar"
How to use data analysis tools for beginners
python textbook for beginners
Try to calculate RPN in Python (for beginners)
Memo # 2 for Python beginners to read "Detailed Python Grammar"
How to use numpy
[Blender x Python] How to make an animation
How to make Substance Painter Python plugin (Introduction)
[Blender x Python] How to make vertex animation
Memo # 7 for Python beginners to read "Detailed Python Grammar"
Memo # 6 for Python beginners to read "Detailed Python Grammar"
How to operate NumPy
How to make Python Interpreter changes in Pycharm
[For beginners] How to study programming Private memo
OpenCV for Python beginners
Memo # 5 for Python beginners to read "Detailed Python Grammar"
Tool to make mask image for ETC in Python
[BigQuery] How to use BigQuery API for Python -Table creation-
Explain in detail how to make sounds with python
How to run python in virtual space (for MacOS)
How to make unit tests Part.2 Class design for tests
Python beginners talk about how to remember this much
How to make a Python package using VS Code
How to write faster when using numpy like deque
[Introduction to Python] How to write repetitive statements using for statements
[2020.8 latest] How to install Python
Convert numpy int64 to python int
python3: How to use bottle (2)
Learning flow for Python beginners
[Python] How to use list 1
How to update Python Tkinter to 8.6
How to use Python argparse
Python3 environment construction (for beginners)
3 Reasons Beginners to Start Python
How to install mkl numpy
Python: How to use pydub
[Python] How to use checkio
Python #function 2 for super beginners
How to run Notepad ++ Python
Basic Python grammar for beginners
How to change Python version