A library called numba makes it relatively easy to speed up Python code. Hopefully you can speed it up by just writing from numba import jit and writing @ jit on the line before the function you want to speed up.

The mechanism seems to be that numba gets the Python virtual machine code, compiles it into LLVM IR, and uses LLVM to make it native code. At the first execution, the compile process runs, so it will be a little slower, but if it is a heavy process, numba may be faster even considering the compile time.

Advantages and disadvantages

I will mention it first.

advantage

--In some cases, you can easily speed up the code without modifying it. ――Even if there is a code modification, it is often a minor modification. --You can easily use it in the .py file without the hassle of building it in separate files.

Disadvantage

――It seems that not all Python functions are supported, and in some cases it is not enough to just add @jit. ――Since the handling of types becomes strict, the part that used to work with "Nana" will become an error. You may also need to devise ways to get numba to do type inference. ――Of course, you need numba to run it. Depending on the environment, it may be difficult to install numba ――It seems easy in the conda environment. There are quite a few environments where you can enter with pip. The Arch Linux I'm using is currently Python 3.8 and LLVM 9.0, both of which aren't supported by numba at this time, so I gave up building and used the conda environment with Docker. ――It takes time to compile, so if you put it in the dark clouds, it will have the opposite effect.

Example

Here are some examples that work very well. I don't know, but there are some very slow functions.

import sys
sys.setrecursionlimit(100000)

def ack(m, n):
    if m == 0:
        return n + 1
    if n == 0:
        return ack(m - 1, 1)
    return ack(m - 1, ack(m, n - 1))

I'll measure the time for a moment.

import time
from contextlib import contextmanager

@contextmanager
def timer():
    t = time.perf_counter()
    yield None
    print('Elapsed:', time.perf_counter() - t)

with timer():
    print(ack(3, 10))

8189 Elapsed: 10.270420542001375

It took 10 seconds.

Increasing the number will take more time, but I don't recommend it because it really takes time. In particular, if you increase 3 to 4, you probably won't finish until you die, so I don't recommend it at all. This function is [Ackermann function](https://ja.wikipedia.org/wiki/%E3%82%A2%E3%83%83%E3%82%AB%E3%83%BC%E3%83%9E It is known as% E3% 83% B3% E9% 96% A2% E6% 95% B0).

Let's speed this up with numba.

from numba import jit

@jit
def ack(m, n):
    if m == 0:
        return n + 1
    if n == 0:
        return ack(m - 1, 1)
    return ack(m - 1, ack(m, n - 1))

#First time
with timer():
    print(ack(3, 10))

#Second time
with timer():
    print(ack(3, 10))

#Third time
with timer():
    print(ack(3, 10))

8189 Elapsed: 0.7036043469997821 8189 Elapsed: 0.4371343919992796 8189 Elapsed: 0.4372558859977289

What took 10 seconds has shrunk to 0.7 seconds for the first time and 0.4 seconds for the second and subsequent times. If this happens just by adding one line, it's really a good deal.

If it's not faster than you think

Object mode may be used.

numba has No Python mode and Object mode, and it compiles in No Python mode once, but if it fails, it compiles in Object mode. (However, this specification will disappear in the future, and by default only No Python mode, Object mode seems to be optional)

The former handles all types directly, while the latter hits Python objects from the Python C API, and the former is faster. Furthermore, the latter may not be able to efficiently rewrite the loop into native code, while the former will make the loop more efficient. (loop-jitting)

To force No Python mode, write @jit (nopython = True) or @njit, but this often causes errors like" I don't know the type ".

Basically,

--Only one type can be included in one variable --If there are multiple returns, make sure that all returns return the same type. --Set the function called by the function in No Python mode to No Python mode.

You can clarify the type by such a method.

Relatively easy to support No Python mode

--Cut out only the heavy calculation part to another function and make it correspond to numba ――Even if you cut out the part that is messing up the type or assigning it to the object, think about putting it out. --Numba Think about not having to do it with numba in the form of pre-processing or post-processing of the corresponding part

Etc. are recommended. Be aware that what Python is good at is done with Python, and what Python is not good at is done with numba.

Try to parallelize

With @ jit (parallel = True), you can use prange instead of range in the for loop (requires from numba import prange).

Loops written with prange are parallelized.

Compile result cache

With @ jit (cache = True), you can write the compilation result to a cache file and avoid the trouble of compiling every time.

Use `fastmath`

It can be used with @jit (fastmath = True). Enable fastmath, which is also found in gcc and clang. It's a slightly dangerous optimization that speeds up float calculations.

Use CUDA

It's not very easy, but you can use CUDA for the time being. If you've used CUDA before, you'll find out with the code below.

Personally, I thought that cupy would be fine if this was the case.

import numpy as np
from numba import cuda

@cuda.jit
def add(a, b, n):
    idx = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x
    if idx < n:
        a[idx] += b[idx]

N = 1000000
a_host = np.array(np.ones(N))
b_host = np.array(np.ones(N))
a_dev = cuda.to_device(a_host)
b_dev = cuda.to_device(b_host)

n_thread = 64
n_block = N // n_thread + 1

add[n_block, n_thread](a_dev,b_dev,N)

a_dev.copy_to_host(a_host)

print(a_host) # Expect: [2, 2, ..., 2]

If you get angry that you don't have libNVVM, you either don't have CUDA installed (you can install it with conda install cudatoolkit, etc.) or you need to set environment variables.

Setting example in Google Colab etc .:

import os
os.environ['NUMBAPRO_LIBDEVICE'] = "/usr/local/cuda-10.0/nvvm/libdevice"
os.environ['NUMBAPRO_NVVM'] = "/usr/local/cuda-10.0/nvvm/lib64/libnvvm.so"

Summary

You can use numba to roughly speed up your Python code. I saw how to use it easily.

I'm glad if you can use it as a reference.

References

The Official Documents is not very long and very helpful if you read only where you need it. Performance Tips (https://numba.pydata.org/numba-doc/dev/user/performance-tips.html) is especially helpful.

For environment variable settings when using CUDA, https://colab.research.google.com/github/cbernet/maldives/blob/master/numba/numba_cuda.ipynb I referred to.

Roughly speed up Python with numba