In technical analysis of time series data, it is common to take the average while moving the window, and to find the maximum and minimum values. With pandas, you can easily write by specifying the move window with rolling and using the mean, max, min methods. This article is a note when I was looking for a faster way than pandas.
First, create a time series of random numbers with numpy array and pandas Series as shown below.
import numpy as np
import pandas as pd
a = np.random.randint(100, size=100000)
s = pd.Series(a)
The mean in the moving window (so-called simple moving average) can be written as follows using the mean method for rolling.
period=10 #period
%timeit smean = s.rolling(period).mean()
Execution time
100 loops, best of 3: 5.47 ms per loop
was. Next are the maximum and minimum values in the move window.
%timeit smax = s.rolling(period).max()
%timeit smin = s.rolling(period).min()
100 loops, best of 3: 5.51 ms per loop
100 loops, best of 3: 5.53 ms per loop
The execution time is almost the same as the moving average.
Since the moving average is a so-called FIR filter, you can use scipy's lfilter function.
from scipy.signal import lfilter
%timeit amean = lfilter(np.ones(period)/period, 1, a)
Calculate as an FIR filter with all weights set to 1 / period. Execution time
1000 loops, best of 3: 980 µs per loop
have become. It's more than 5 times faster than pandas. As expected it is scipy.
Now, I want to find the maximum and minimum values, but there is no function that is perfect for that, and the one that seems to be usable is order_filter. It was a scipy.signal.order_filter.html) function. This function is a function that sequentially returns the value of the specified rank in the specified window. Specify the window mask array in the argument domain and the rank in the argument rank. However, since the target window will be centered on time-series samples, put 1 only in the first half of the array. For the minimum value, rank = 0, and for the maximum value, rank = period-1.
from scipy.signal import order_filter
domain = np.concatenate((np.ones(period), np.zeros(period-1)))
%timeit amax = order_filter(a, domain, period-1)
%timeit amin = order_filter(a, domain, 0)
The execution result is as follows.
10 loops, best of 3: 102 ms per loop
10 loops, best of 3: 102 ms per loop
This time it's almost 20 times slower than pandas. Even the scipy function didn't work. After all, it is probably because it is sorted every time so that it can be ranked arbitrarily. If you want to find the maximum and minimum values, you should use a dedicated function.
Recommended Posts