The most basic moving averages of technical indicators, of which the simple moving average (SMA) is just an average, but it is used to calculate many technical indicators other than SMA. In fact, of the 30 or so technical indicators posted on GitHub, 40% use SMA.
This time, I would like to specialize in that SMA and compare some Python code.
Since we have updated the complete set of Python packages, the versions of Python and the packages used are as follows.
First of all Random walk in Python Make a random walk of 100,000 samples with reference to. This is the input data for SMA.
import numpy as np
import pandas as pd
from numba import jit
dn = np.random.randint(2, size=100000)*2-1
gwalk = np.cumprod(np.exp(dn*0.01))*100
The simplest implementation of SMA is with pandas. It can be easily written using the Series methods rolling and mean.
def SMA1(x, period):
return pd.Series(x).rolling(period).mean()
As a common specification, enter the input time series and SMA period in the argument. Since comparisons will be made according to the difference in period, measure at period = 20,200.
%timeit y1_20 = SMA1(gwalk, 20)
%timeit y1_200 = SMA1(gwalk, 200)
100 loops, best of 3: 6.02 ms per loop
100 loops, best of 3: 6.01 ms per loop
In the case of pandas, there seems to be no difference in execution speed depending on the period.
Comparison of moving average calculation time written in Python Let's implement it using scipy's filter function lfilter, referring to.
from scipy.signal import lfilter
def SMA2(x, period):
return lfilter(np.ones(period), 1, x)/period
Let's measure the execution time in the same way.
%timeit y2_20 = SMA2(gwalk, 20)
%timeit y2_200 = SMA2(gwalk, 200)
100 loops, best of 3: 5.53 ms per loop
100 loops, best of 3: 10.4 ms per loop
Since lfilter is a general-purpose filter function, not dedicated to SMA, the execution time seems to change depending on the period. Shorter periods are faster than pandas, but longer periods are slower.
Let's write the SMA calculation formula directly using the for statement. Of course, it is obvious that it will be slow if it is left as it is, so use numba to speed it up as the title says.
@jit
def SMA3(x, period):
y = np.zeros(len(x))
for i in range(len(y)):
for j in range(period):
y[i] += x[i-j]
return y/period
%timeit y3_20 = SMA3(gwalk, 20)
%timeit y3_200 = SMA3(gwalk, 200)
100 loops, best of 3: 3.07 ms per loop
10 loops, best of 3: 32.3 ms per loop
I am using a for statement, but if the period is 20 due to the effect of speeding up numba, it is the fastest so far. However, since it is proportional to the period, if it is 200, it will be 10 times slower, and it will be the slowest.
The final implementation is a method that takes advantage of the characteristics of SMA. Since SMA simply adds samples, it only calculates by subtracting the old sample value and adding the new sample value using the calculation result of one sample before.
@jit
def SMA4(x, period):
y = np.empty(len(x))
y[:period-1] = np.nan
y[period-1] = np.sum(x[:period])
for i in range(period, len(x)):
y[i] = y[i-1]+x[i]-x[i-period]
return y/period
We will add the samples until they are ready for the period, but after that, we only need to add three data. The execution speed is as follows.
%timeit y4_20 = SMA4(gwalk, 20)
%timeit y4_200 = SMA4(gwalk, 200)
1 loop, best of 3: 727 µs per loop
1000 loops, best of 3: 780 µs per loop
It has the fastest result of any implementation so far. The result was almost the same even if the period was extended.
As mentioned above, assuming the speedup by numba, it was found that the speed of SMA is quite high even if the for statement is used.
Recommended Posts