Describe how to perform portmanteau test in python
A method of testing whether there is a correlation in a series of correlation functions.
For more information, see [wikipedia](https://en.wikipedia.org/wiki/%E3%81%8B%E3%81%B0%E3%82%93%E6%A4%9C%E5% AE% 9A)
For example, when performing the Ljung-Box test statsmodels.stats.diagnostic.acorr_ljungbox Is used. Click here for details (https://www.statsmodels.org/stable/generated/statsmodels.stats.diagnostic.acorr_ljungbox.html)
For example, the test is performed using randomly generated noise (white Gaussian noise). Of course, there should be no correlation, so the null hypothesis should not be rejected.
import matplotlib as mpl
import matplotlib.pyplot as plt
plt.style.use('seaborn')
mpl.rcParams['font.family'] = 'serif'
%matplotlib inline
import numpy as np
from statsmodels.stats.diagnostic import acorr_ljungbox
p = print
#The data points are 1000 points
np.random.seed(42)
data = np.random.standard_normal(1000)
#First plot the data
plt.figure(figsize=(10,6))
plt.plot(data,lw = 1.5)
plt.xlabel('time')
plt.ylabel('value')
plt.xlim([0,100])
plt.title('time vs. value plot');
Naturally, the time series data of white Gaussian noise is plotted.

Now, let's do a portmanteau test.
result = acorr_ljungbox(data,lags = 5)
p(result)
The result is as follows.
(array([0.05608493, 0.05613943, 0.31898424, 3.27785331, 3.94903872]), array([0.81279444, 0.97232058, 0.9564194 , 0.51244884, 0.55677627]))
It is output in tuple format with two elements, the first is the test statistic and the second is the p-value. Let's make it tabular so that it looks beautiful.
result_table = pd.DataFrame(data = result, index=['static value', 'P value'],columns=[str(i) for i in range(1,6)])
result_table
The following result is output. The column direction is the size of the lug.

Next, let's test the MA (2) process. Assume the following formula.
y_t = 1 + \epsilon_t + 0.5 \epsilon_{t-3}
However, $ \ epsilon_t $ is white Gaussian noise. As you can see from the form of the equation, it seems that there is a correlation when the time difference is 3 (for example, $ y_5 $ and $ y_8 $). Of course, it can be confirmed mathematically, but this is confirmed by the portmanteau test.
#Creating model data
data = np.zeros(1000)
np.random.seed(42)
err = np.random.standard_normal(1000)
for i in range(1000):
    if i-3 < 0:
        data[i] = 1 + err[i]
    else:
        data[i] = 1 + err[i] + 0.5 * err[i-3]
#First plot the data
plt.figure(figsize=(10,6))
plt.plot(data,lw = 1.5)
plt.xlabel('time')
plt.ylabel('value')
plt.title('time vs. value plot (MA(3) model)')
plt.xlim([0,100])

result = acorr_ljungbox(data,lags = 5)
result_table = pd.DataFrame(data = result, index=['static value', 'P value'],columns=[str(i) for i in range(1,6)])
result_table

For example, when P is tested at 0.05, there is no significant difference when the lag is 2 or less, but it is found that there is a significant difference when it is 3 or more (that is, when $ \ rho_3 $ is included). I will.
Recommended Posts