Precautions when using for statements in pandas

Notes on speeding up Python code with Numba It's a continuation or a supplement.

Pandas is convenient for processing time series data, but if you use the for statement as it is, it will be extremely slow.

import numpy as np
import pandas as pd
dataM1 = pd.read_csv('DAT_ASCII_EURUSD_M1_2015.csv', sep=';',
                     names=('Time','Open','High','Low','Close', ''),
                     index_col='Time', parse_dates=True)

def LWMA(s, ma_period):
    y = pd.Series(0.0, index=s.index)
    for i in range(len(y)):
        for j in range(ma_period):
            y[i] += s[i-j]*(ma_period-j)
        y[i] /= ma_period*(ma_period+1)/2
    return y

%time MA = LWMA(dataM1['Close'], 10)

Wall time: 3min 10s

The number of data is as large as 370,000, but it is difficult to take 3 minutes on a moving average of 10 samples. For longer periods, it can take 10 minutes or more. (Core i7-6700 3.4GHz)

In Previous article, I replaced it with array and tried to speed it up using Numba, but in fact, it will be faster just by using array.

def LWMA1(s, ma_period):
    a = s.values
    y = np.zeros(len(a))
    for i in range(len(y)):
        for j in range(ma_period):
            y[i] += a[i-j]*(ma_period-j)
        y[i] /= ma_period*(ma_period+1)/2
    return pd.Series(y, index=s.index)

%timeit MA = LWMA1(dataM1['Close'], 10)

1 loop, best of 3: 1.92 s per loop

Even without using Numba, it's about 100 times faster than pandas.

Of course, in this state, using Numba will make it even faster.

from numba import jit
@jit
def LWMA2(s, ma_period):
    a = s.values
    y = np.zeros(len(a))
    for i in range(len(y)):
        for j in range(ma_period):
            y[i] += a[i-j]*(ma_period-j)
        y[i] /= ma_period*(ma_period+1)/2
    return pd.Series(y, index=s.index)

%timeit MA = LWMA2(dataM1['Close'], 10)

100 loops, best of 3: 5.31 ms per loop

Even if you use a for statement in pandas, it is wise to do so if it can be replaced by an array (if the index is irrelevant). It seems that a little trick is needed to get along well with pandas.

Recommended Posts

Precautions when using for statements in pandas

Precautions when using pit in Python

Precautions when using codecs and pandas

Precautions when using Chainer

Precautions when using tf.keras.layers.TimeDistributed for tf.keras custom layer

Precautions when changing unix time to datetime type in pandas

Precautions when using TextBlob trait analysis

Precautions when using the urllib.parse.quote function

[TouchDesigner] Tips for for statements using python

Precautions when using phantomjs from python

Precautions when using six with Python 2.5