Notes on speeding up Python code with Numba It's a continuation or a supplement.
Pandas is convenient for processing time series data, but if you use the for statement as it is, it will be extremely slow.
import numpy as np
import pandas as pd
dataM1 = pd.read_csv('DAT_ASCII_EURUSD_M1_2015.csv', sep=';',
names=('Time','Open','High','Low','Close', ''),
index_col='Time', parse_dates=True)
def LWMA(s, ma_period):
y = pd.Series(0.0, index=s.index)
for i in range(len(y)):
for j in range(ma_period):
y[i] += s[i-j]*(ma_period-j)
y[i] /= ma_period*(ma_period+1)/2
return y
%time MA = LWMA(dataM1['Close'], 10)
Wall time: 3min 10s
The number of data is as large as 370,000, but it is difficult to take 3 minutes on a moving average of 10 samples. For longer periods, it can take 10 minutes or more. (Core i7-6700 3.4GHz)
In Previous article, I replaced it with array and tried to speed it up using Numba, but in fact, it will be faster just by using array.
def LWMA1(s, ma_period):
a = s.values
y = np.zeros(len(a))
for i in range(len(y)):
for j in range(ma_period):
y[i] += a[i-j]*(ma_period-j)
y[i] /= ma_period*(ma_period+1)/2
return pd.Series(y, index=s.index)
%timeit MA = LWMA1(dataM1['Close'], 10)
1 loop, best of 3: 1.92 s per loop
Even without using Numba, it's about 100 times faster than pandas.
Of course, in this state, using Numba will make it even faster.
from numba import jit
@jit
def LWMA2(s, ma_period):
a = s.values
y = np.zeros(len(a))
for i in range(len(y)):
for j in range(ma_period):
y[i] += a[i-j]*(ma_period-j)
y[i] /= ma_period*(ma_period+1)/2
return pd.Series(y, index=s.index)
%timeit MA = LWMA2(dataM1['Close'], 10)
100 loops, best of 3: 5.31 ms per loop
Even if you use a for statement in pandas, it is wise to do so if it can be replaced by an array (if the index is irrelevant). It seems that a little trick is needed to get along well with pandas.
Recommended Posts