Notes on speeding up Python code with Numba So, I found that Numba is effective in speeding up technical indicator functions using for statements, but there are other indicators that make heavy use of if statements.
One of them is Parabolic SAR. This is not a particularly unusual indicator, but rather popular. However, since the ascending mode and descending mode are switched and the step width changes, it cannot be described by the for statement alone. This was the last time I ported MetaTrader's technical indicators to Python.
This time is a memo when speeding up this.
import numpy as np
import pandas as pd
dataM1 = pd.read_csv('DAT_ASCII_EURUSD_M1_2015.csv', sep=';',
names=('Time','Open','High','Low','Close', ''),
index_col='Time', parse_dates=True)
def iSAR(df, step, maximum):
last_period = 0
dir_long = True
ACC = step
SAR = df['Close'].copy()
for i in range(1,len(df)):
last_period += 1
if dir_long == True:
Ep1 = df['High'][i-last_period:i].max()
SAR[i] = SAR[i-1]+ACC*(Ep1-SAR[i-1])
Ep0 = max([Ep1, df['High'][i]])
if Ep0 > Ep1 and ACC+step <= maximum: ACC+=step
if SAR[i] > df['Low'][i]:
dir_long = False
SAR[i] = Ep0
last_period = 0
ACC = step
else:
Ep1 = df['Low'][i-last_period:i].min()
SAR[i] = SAR[i-1]+ACC*(Ep1-SAR[i-1])
Ep0 = min([Ep1, df['Low'][i]])
if Ep0 < Ep1 and ACC+step <= maximum: ACC+=step
if SAR[i] < df['High'][i]:
dir_long = True
SAR[i] = Ep0
last_period = 0
ACC = step
return SAR
%timeit y = iSAR(dataM1, 0.02, 0.2)
The for statement is single, but it takes some time.
1 loop, best of 3: 1min 19s per loop
First, let's speed up with Numba. Just change the pandas array to a numpy array and add @jit
.
from numba import jit
@jit
def iSARjit(df, step, maximum):
last_period = 0
dir_long = True
ACC = step
SAR = df['Close'].values.copy()
High = df['High'].values
Low = df['Low'].values
for i in range(1,len(SAR)):
last_period += 1
if dir_long == True:
Ep1 = High[i-last_period:i].max()
SAR[i] = SAR[i-1]+ACC*(Ep1-SAR[i-1])
Ep0 = max([Ep1, High[i]])
if Ep0 > Ep1 and ACC+step <= maximum: ACC+=step
if SAR[i] > Low[i]:
dir_long = False
SAR[i] = Ep0
last_period = 0
ACC = step
else:
Ep1 = Low[i-last_period:i].min()
SAR[i] = SAR[i-1]+ACC*(Ep1-SAR[i-1])
Ep0 = min([Ep1, Low[i]])
if Ep0 < Ep1 and ACC+step <= maximum: ACC+=step
if SAR[i] < High[i]:
dir_long = True
SAR[i] = Ep0
last_period = 0
ACC = step
return SAR
%timeit y = iSARjit(dataM1, 0.02, 0.2)
1 loop, best of 3: 1.43 s per loop
It's about 55 times faster. There are few code fixes, so it's a decent result.
Next, try speeding up with Cython. I thought Cython was a hassle to set up, but with Jupyter notebook, it was fairly easy to install. However, since it uses an external compiler, you need to install Visual C ++. I had to match the version of Anaconda that I built, so I installed the following compiler this time.
The first is when you just set up Cython without changing the code.
%load_ext Cython
%%cython
cimport numpy
cimport cython
def iSAR_c0(df, step, maximum):
last_period = 0
dir_long = True
ACC = step
SAR = df['Close'].values.copy()
High = df['High'].values
Low = df['Low'].values
for i in range(1,len(SAR)):
last_period += 1
if dir_long == True:
Ep1 = High[i-last_period:i].max()
SAR[i] = SAR[i-1]+ACC*(Ep1-SAR[i-1])
Ep0 = max([Ep1, High[i]])
if Ep0 > Ep1 and ACC+step <= maximum: ACC+=step
if SAR[i] > Low[i]:
dir_long = False
SAR[i] = Ep0
last_period = 0
ACC = step
else:
Ep1 = Low[i-last_period:i].min()
SAR[i] = SAR[i-1]+ACC*(Ep1-SAR[i-1])
Ep0 = min([Ep1, Low[i]])
if Ep0 < Ep1 and ACC+step <= maximum: ACC+=step
if SAR[i] < High[i]:
dir_long = True
SAR[i] = Ep0
last_period = 0
ACC = step
return SAR
%timeit y = iSAR_c0(dataM1, 0.02, 0.2)
result
1 loop, best of 3: 1.07 s per loop
Cython is a little faster with the same code.
Next, when you add a variable type declaration with cdef
.
%%cython
cimport numpy
cimport cython
def iSARnew(df, double step, double maximum):
cdef int last_period = 0
dir_long = True
cdef double ACC = step
cdef numpy.ndarray[numpy.float64_t, ndim=1] SAR = df['Close'].values.copy()
cdef numpy.ndarray[numpy.float64_t, ndim=1] High = df['High'].values
cdef numpy.ndarray[numpy.float64_t, ndim=1] Low = df['Low'].values
cdef double Ep0, Ep1
cdef int i, N=len(SAR)
for i in range(1,N):
last_period += 1
if dir_long == True:
Ep1 = max(High[i-last_period:i])
SAR[i] = SAR[i-1]+ACC*(Ep1-SAR[i-1])
Ep0 = max([Ep1, High[i]])
if Ep0 > Ep1 and ACC+step <= maximum: ACC+=step
if SAR[i] > Low[i]:
dir_long = False
SAR[i] = Ep0
last_period = 0
ACC = step
else:
Ep1 = min(Low[i-last_period:i])
SAR[i] = SAR[i-1]+ACC*(Ep1-SAR[i-1])
Ep0 = min([Ep1, Low[i]])
if Ep0 < Ep1 and ACC+step <= maximum: ACC+=step
if SAR[i] < High[i]:
dir_long = True
SAR[i] = Ep0
last_period = 0
ACC = step
return SAR
%timeit y = iSARnew(dataM1, 0.02, 0.2)
Result is
1 loop, best of 3: 533 ms per loop
was. It's about twice as fast. It may be faster if you tune it, but it can make your code less readable, so I'll leave it here.
In the case of only the for statement, Numba also has the effect of speeding up considerably, but if the if statement is also included, the effect will decrease. If you want to make it a little faster, you may want to use Cython, with some code modifications.
Recommended Posts