I compared the moving average of IIR filter type with pandas and scipy

Prerequisite articles

Comparison of moving average calculation time written in Python

It is out of the question to use the for statement in Python to calculate the moving average, and the result is that it is better to use the pandas and scipy functions. However, in the above article, it was the moving average of FIR filter type such as SMA and LWMA, so this time I investigated the moving average of IIR filter type such as EMA and SMMA.

EMA EMA is an abbreviation for Exponential Moving Average and is expressed by the following formula.

y(n)=\alpha x(n)+(1-\alpha)y(n-1)

Where $ \ alpha $ is a real parameter from 0 to 1. This formula does not include a parameter that represents the period, but since SMA and LWMA use the parameter that represents the period, EMA often uses the period parameter accordingly. Assuming that the period is $ p $, EMA is $ \ alpha = 2 / (p + 1) $ and SMMA is $ \ alpha = 1 / p $. Personally, I don't think it is necessary to distinguish between EMA and SMMA because they have the same formula, but I will mention them because they are used separately in MetaTrader.

Implemented with pandas

First, let's implement it with pandas. The data to be processed is the same as in the previous article. It is a four-value time series of about 370,000 pieces. Since pandas is for time series data, EMA can also be easily written with the ʻewm () and mean () `functions.

import numpy as np
import pandas as pd
dataM1 = pd.read_csv('DAT_ASCII_EURUSD_M1_2015.csv', sep=';',
                     names=('Time','Open','High','Low','Close', ''),
                     index_col='Time', parse_dates=True)

def EMA(s, ma_period):
    return s.ewm(span=ma_period).mean()

%timeit MA = EMA(dataM1['Close'], 10)

Since the execution time will be compared this time as well, the measurement results will be shown.

10 loops, best of 3: 32.2 ms per loop

In the case of SMA, it was about 16 milliseconds, so it is about twice as slow.

A little conversion before implementing with scipy

When using scipy's lfilter (), it is not enough to just enter $ \ alpha $, you need to change to the IIR filter format and enter the coefficient. So, I will convert it a little. (Detailed theory is omitted. It is the basis of digital signal processing.)

Convert both sides of the EMA expression to $ z $.

Y(z)=\alpha X(z)+(1-\alpha)z^{-1}Y(z)

If you put $ Y (z) $ on the left side

\\{1-(1-\alpha)z^{-1}\\}Y(z)=\alpha X(z)

And if $ Y (z) / X (z) $ is $ H (z) $,

H(z)=\frac{\alpha}{1-(1-\alpha)z^{-1}}

Can be written. This is the system function of the IIR filter. The numerator and denominator polynomial coefficients of this system function are passed to the argument of lfilter ().

In the case of EMA, the numerator of the system function is a constant and the denominator is a linear polynomial, so the general formula of the system function can be written as follows.

H(z)=\frac{b_0}{a_0+a_1z^{-1}}

Comparing the coefficients, you can see that the coefficients of $ b $ and $ a $ are as follows.

b_0=\alpha, \ a_0=1, \ a_1=\alpha-1

Implemented with scipy

So, if you implement EMA using scipy's lflter (), you can write as follows. Put the above $ b $ and $ a $ in the argument in the form of a list.

from scipy.signal import lfilter
def EMAnew(s, ma_period):
    alpha = 2/(ma_period+1)
    y = lfilter([alpha], [1,alpha-1], s)
    return pd.Series(y, index=s.index)

%timeit MA = EMAnew(dataM1['Close'], 10)

Result is

100 loops, best of 3: 3.08 ms per loop

It became almost the same speed as the case of SMA. After all, lfilter () is fast.

Adjusting the initial conditions of lfilter ()

The result is that lfilter () is fast this time as well, but there is a slight problem with the processing result.

In EMA, when calculating the first output $ y (0) $, $ y (-1) $ with no data is used, but in the case of pandas, it is suitable for time series data, so $ y It is processed so that $ y (-1) $ is not used as (0) = x (0) $.

pd.DataFrame({'Close':dataM1['Close'],'EMA':MA}).head(10)
Close EMA
Time
2015-01-01 13:00:00 1.20962 1.209620
2015-01-01 13:01:00 1.20962 1.209620
2015-01-01 13:02:00 1.20961 1.209616
2015-01-01 13:04:00 1.20983 1.209686
2015-01-01 13:05:00 1.20988 1.209742
2015-01-01 13:06:00 1.20982 1.209762
2015-01-01 13:07:00 1.20987 1.209788
2015-01-01 13:08:00 1.21008 1.209855
2015-01-01 13:09:00 1.20996 1.209878
2015-01-01 13:10:00 1.20977 1.209855

In this case, the EMA result is not so different from the input time series, but in the case of lfilter (), it is calculated as $ y (-1) = 0 $, so the first EMA The value of will deviate considerably from the input.

Close EMA
Time
2015-01-01 13:00:00 1.20962 0.219931
2015-01-01 13:01:00 1.20962 0.399874
2015-01-01 13:02:00 1.20961 0.547099
2015-01-01 13:04:00 1.20983 0.667596
2015-01-01 13:05:00 1.20988 0.766193
2015-01-01 13:06:00 1.20982 0.846852
2015-01-01 13:07:00 1.20987 0.912855
2015-01-01 13:08:00 1.21008 0.966896
2015-01-01 13:09:00 1.20996 1.011090
2015-01-01 13:10:00 1.20977 1.047213

It seems that this problem can be solved with the optional argument of lfilter (). By writing the following, I got almost the same result as pandas.

def EMAnew(s, ma_period):
    alpha = 2/(ma_period+1)
    y,zf = lfilter([alpha], [1,alpha-1], s, zi=[s[0]*(1-alpha)])
    return pd.Series(y, index=s.index)

Here, zi is the initial value of the state variable, so it is not just the initial value of input and output, but here, $ y (0) = \ alpha x (0) + zi = x (0) If you put a zi that becomes $, it seems that the result will be like that.

Recommended Posts

I compared the moving average of IIR filter type with pandas and scipy
I compared the speed of Hash with Topaz, Ruby and Python
I compared the calculation time of the moving average written in Python
I tried to compare the processing speed with dplyr of R and pandas of Python
I tried to find the average of the sequence with TensorFlow
See the power of speeding up with NumPy and SciPy
[Introduction to Python] I compared the naming conventions of C # and Python.
I wrote the basic operation of Pandas with Jupyter Lab (Part 1)
I wrote the basic operation of Pandas with Jupyter Lab (Part 2)
Plot the Nikkei Stock Average with pandas
Import of japandas with pandas 1.0 and above
I vectorized the chord of the song with word2vec and visualized it with t-SNE
I compared the performance of Vaex, Dask, and Pandas in CSV, Parquet, and HDF5 formats (for single files).
I made a mistake in fetching the hierarchy with MultiIndex of pandas
I measured the speed of list comprehension, for and while with python2.7.
Try to separate the background and moving object of the video with OpenCV
I compared the speed of go language web framework echo and python web framework flask
Talking about the features that pandas and I were in charge of in the project
I compared the speed of regular expressions in Ruby, Python, and Perl (2013 version)
I tweeted the illuminance of the room with Raspberry Pi, Arduino and optical sensor
Type conversion of multiple columns of pandas DataFrame with astype at the same time
Learn while implementing with Scipy Logistic regression and the basics of multi-layer perceptron
Note that the calculation of average pairwise correlation was very easy with pandas
I tried the pivot table function of pandas
I compared "python dictionary type" and "excel function"
[Cistre] Buy and sell with moving average MACD ♬
I read and implemented the Variants of UKR
I tried using the image filter of OpenCV
[Python] Determine the type of iris with SVM
I tried to automate the article update of Livedoor blog with Python and selenium.
I compared the speed of the reference of the python in list and the reference of the dictionary comprehension made from the in list.
I just wanted to extract the data of the desired date and time with Django
I compared the identity of the images by Hu moment
Visualize the range of interpolation and extrapolation with python
I checked out the versions of Blender and Python
I measured the performance of 1 million documents with mongoDB
I checked the default OS and shell of docker-machine
Extract the maximum value with pandas and change that value
FFT processing with numpy and scipy and low pass filter
Check the type and version of your Linux distribution
I tried to get the number of days of the month holidays (Saturdays, Sundays, and holidays) with python
Animate the alpha and beta values of the world's top market cap stocks with pandas + matplotlib
I learned the basics of reinforcement learning and played with Cart Pole (implementing simple Q Learning)
Moving average with numpy
Create filter with scipy
I tried to implement a volume moving average with Quantx
I tried "gamma correction" of the image with Python + OpenCV
I wrote the basic grammar of Python with Jupyter Lab
Bitwise operations including the return type and cast of numpy.random.choice
I hacked the Amazon Dash Button and registered with Salesforce
I evaluated the strategy of stock system trading with Python.
Reformat the timeline of the pandas time series plot with matplotlib
I implemented the FloodFill algorithm with TRON BATTLE of CodinGame.
I want to know the features of Python and pip
I wrote the basic operation of matplotlib with Jupyter Lab
I displayed the chat of YouTube Live and tried playing
Play with the password mechanism of GitHub Webhook and Python
I made a LINE bot that tells me the type and strength of Pokemon in the Galar region with Heroku + Flask + PostgreSQL (Heroku Postgres)