In a previous article, I wrote about speeding up NumPy and SciPy:
Speeding up numerical calculation using NumPy: Basics Speeding up numerical calculation using NumPy / SciPy: Application 1 Speeding up numerical calculation using NumPy / SciPy: Application 2
Is it really fast? Let's find out.
Compare with the oleore implementation by Python. It may not be a valid evaluation because it is a comparison with the implementation that emphasizes simplicity rather than speed. Python ʻanaconda3, IPython's
% timeit` for time measurement Use.
--Execution environment-- OS : Ubuntu16.04 LTS 64bit Python : anaconda3-4.1.1 CPU : Intel Corei5 3550 (4-core / 4-thread)
For example, matrix initialization.
before
N = 3000
a = [[i + j for i in range(N)] for j in range(N)]
Let's say you wrote it in a comprehension like this. The execution time is ** 661ms </ font> **. On the other hand, in NumPy's ʻarange`
after
a = np.arange(0, N) + np.arange(0, N).reshape(N, 1)
** 15.5ms </ font> **. I don't say 100 times, but 40 times faster.
There are many linear algebra operations, but let's try a simple matrix product. Prepare two 200x200 matrices and calculate the product:
setting
import numpy as np
from numpy.random import rand
N = 200
a = np.array(rand(N, N))
b = np.array(rand(N, N))
c = np.array([[0] * N for _ in range(N)])
If implemented with the definition of matrix product
befor
for i in range(N):
for j in range(N):
for k in range(N):
c[i][j] = a[i][k] * b[k][j]
Is it? Execution time is ** 7.7s </ font> **.
With NumPy's universal functions
after
c = np.dot(a, b)
It's good to be simple before speed. Execution time is ** 202us </ font> **. It's a moment that it's unreasonable to say how many times. The power of processing. Even if it is 1000 × 1000 ** 22.2ms </ font> **. At this point, the implementation of for loop becomes uncontrollable.
Let's differentiate $ \ sin x $. The number of space divisions is 100000:
setting
import math as m
def f(x):
return m.sin(x)
def g(x):
return np.sin(x)
N, L = 100000, 2 * m.pi
x, dx = np.linspace(0, L, N), L / N
Keep the definition of derivative. Don't worry about accuracy at this time:
before
diff = []
for i in x:
diff.append((f(i + dx) - f(i)) / dx)
Execution time is ** 91.9ms </ font> **. Maybe it's the reason why ʻappend` is slow. On the other hand, in NumPy
after
diff = np.gradient(g(x), dx)
It's simple and doesn't reduce the number of elements in the return value (how grateful this is for numerical calculations!). Execution time is ** 8.25ms </ font> **. It's about 10 times faster. By the way
g = np.vectorize(f)
You can also convert arguments and return values to the ndarray specification by doing.
Let's prepare a little core integral:
\int_{-\infty}^{\infty} dx\int_{-\infty}^{x} dy\; e^{-(x^2 + y^2)}
It is a double integral such that $ x $ is included in the integration interval of $ y $. The integrand and the number of divisions of space
setting
def h(x, y):
return np.exp(-(x**2 + y**2))
N, L = 2000, 100
x, dx = np.linspace(-L/2, L/2, N), L / N
y, dy = np.linspace(-L/2, L/2, N), L / N
If you implement the double integral as defined,
before
ans = 0
for index, X in enumerate(x):
for Y in y[:index+1]:
ans += dx * dy * h(X, Y)
I wonder. The execution time is ** 5.89s </ font> **. But this is too inaccurate. If you write it yourself, you should use Simpson integral, but in that case The code becomes rather complicated. Also, even with Simpson's rule, improper integrals can only be handled by "taking a large enough space". However, if the difference is made finer, the execution time increases on the order of $ O (N ^ 2) $. On the other hand, SciPy's dblquad
solves all that problem:
after
from scipy.integrate import dblquad
ans = dblquad(h, a = -np.inf, b = np.inf, gfun = lambda x : -np.inf, hfun = lambda x : x)[0]
The integration interval of $ x $ is $ [a, b] $, and the integration interval of $ y $ is $ [{\ rm gfun}, {\ rm hfun}] $. With adaptive integration, the error range can also be specified. The return value of dblquad
is tuple, and it seems that the absolute error is also returned. Execution time is ** 51.1ms </ font> **. No more to say anything No. Great.
(It's a bonus. The numerical solution of the eigenvalue equation is a little maniac ...)
Let's solve the Schroedinger equation of the harmonic oscillator system. Is it the Jacobi method if we implement it ourselves? If you are interested, check it out:
before
I = np.eye(N)
H = [[0 for i in range(N)] for j in range(N)]
for i in range(N):
H[i][i] = 2.0/(2*dx**2) + 0.5*(-L/2+dx*i)**2
if(0 <= i+1 < N):
H[i][i+1] = -1.0/(2*dx**2)
if(0 <= i-1 < N):
H[i][i-1] = -1.0/(2*dx**2)
H = np.array(H)
#Jacobi method
flag = True
while(flag):
#Find the maximum and index of off-diagonal components
maxValue = 0
cI, rI = None, None
for j in range(N):
for i in range(j):
if(maxValue < abs(H[i][j])):
maxValue = abs(H[i][j])
rI, cI = i, j
#Convergence test
if(maxValue < 1e-4):
flag = False
# print(maxValue)
#Preparation of rotation matrix
theta = None
if(H[cI][cI] == H[rI][rI]):
theta = m.pi/4
else:
theta = 0.5*m.atan(2.0*H[rI][cI]/(H[cI][cI]-H[rI][rI]))
J = np.eye(N)
J[rI][rI] = m.cos(theta)
J[cI][cI] = m.cos(theta)
J[rI][cI] = m.sin(theta)
J[cI][rI] = -m.sin(theta)
#Matrix operation
H = np.array(np.matrix(J.T)*np.matrix(H)*np.matrix(J))
I = np.array(np.matrix(I)*np.matrix(J))
#Storage of eigenvalues and eigenvectors
v, w = I.transpose(), []
for i in range(N):
w.append([H[i][i], i])
w.sort()
It converges when the maximum value of the off-diagonal term becomes sufficiently small. It is not very convenient because the eigenvalues are not in ascending order. The execution time is ** 15.6s </ font> **. It's hard to increase the number of divisions any more. With NumPy
after
#System settings
L, N = 10, 80
x, dx = np.linspace(-L/2, L/2, N), L / N
#Exercise term K
K = np.eye(N, N)
K_sub = np.vstack((K[1:], np.array([0] * N)))
K = dx**-2 * (2 * K - K_sub - K_sub.T)
#Potential term
V = np.diag(np.linspace(-L/2, L/2, N)**2)
#Hermitian matrix eigenvalue equation
#w is the eigenvalue,v is the eigenvector
H = (K + V) / 2
w, v = np.linalg.eigh(H)
The execution time is ** 1.03ms </ font> **. When it comes to an algorithm as complicated as the eigenvalue equation, there is no chance of winning in many ways. You'll have to rely on Lapack or GSL for C, but that's also hard.
This should be enough. ** You can see how fast NumPy / SciPy works and how simple it can be written. ** If you do the above calculation in C / C ++ without relying on an external library If you try to write it in, the code will be as written at the beginning of each chapter. Also, from experience, it seems that many C / C ++ external libraries are hard to touch.
It is just for reference, but the speed difference is summarized as follows:
Initialize list: ** 661ms </ font> ** → ** 15.5ms </ font> ** Matrix product: ** 7.7s </ font> ** → ** 202us </ font> ** Differentiation: ** 91.9ms </ font> ** → ** 8.25ms </ font> ** (Double) Integral: ** 5.89s </ font> ** → ** 51.1ms </ font> ** Eigenvalue equation: ** 15.6s </ font> ** → ** 1.03ms </ font> **
** Overwhelming. ** I think the calculator is well worth using Python.
Recommended Posts