Stationarity is a very important concept in time series analysis. It represents the nature of the stochastic process that its probability distribution does not change with time or position.
To put it plainly
With a constant value as the axis regardless of the passage of time
It swings and changes with the same width
Time series data can be said to be stationary. There are a lot of formulas, but what the formulas themselves mean Let's pay attention to the shape of the graph and understand it.
For stationarity
It is classified into two types: weak stationarity and strong stationarity.
In the future, time series data analysis will mainly deal with weak stationarity, so simply When we talk about stationarity, think of it as weak stationarity.
Weak stationar means that the expected value and autocovariance of time series data are constant over time. What is self-covariance? Covariance with oneself who is t hours (time) away from the present self How much variation there is between the two.
Let the mean be μ and the k-dimensional autocovariance be γk
It is expressed as. The fact that the expected value is constant means that it fluctuates around a constant value regardless of the passage of time. The constant autocovariance means that the variability of the data is constant. It shows that it swings and changes with the same width.
At all points t
When is true, such a time series is called white noise.
It has a weak stationarity based on the above definition. The above equation shows that the expected value of white noise εt is 0 at any time point t. In the formula below, the dispersion of white noise is constant regardless of time. It means that the covariance is 0, that is, it has no autocorrelation.
White noise is responsible for the irregular variation pattern (error) in the time series model. Since the irregular fluctuation itself is difficult to reproduce mathematically, white noise is used.
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import numpy as np
from pandas import datetime
#White noise settings
mean = 0
std = 1
num_samples = 1000
samples = np.random.normal(mean, std, size=num_samples)
#White noise plot
plt.title("Whitenoise")
plt.plot(samples)
plt.show()
I learned about stationarity, but why is stationarity important?
It is completely meaningless as a result of analyzing non-stationary data in time series data. This is because there is a possibility of detecting a correlation (a relationship in which B goes up or goes down when A goes up).
Since time series data is data that changes over time,
The common item "passage of time" creates a meaningless correlation.
(Such a meaningless correlation is called a spurious correlation.)
For example, a man was born I've been gaining weight for 20 years until I reach 20. On the other hand, China's GDP, which has been developing remarkably in recent years, has been increasing for 20 years since around 1994.
At this time, it is meaningless to say that "China's GDP has increased because I have gained weight."
In order not to detect such meaningless correlation of time series data For a time series without stationarity, perform conversion such as taking a difference to make it a time series with stationarity. We need to proceed with the analysis.
The most effective way to see if a time series is stationary is to actually observe it.
Data visualization is the basis of time series analysis, or more specifically, data analysis.
As mentioned above, stationary time series data does not depend on the passage of time. It fluctuates with the same width around a certain value and changes. There is a feature.
Basic model for analyzing univariate time series data With ARMA model Learn about the ARIMA model
It is a developed form Model time series data using the SARIMA model (Salima).
As the name suggests, the ARMA model will be described later. It is a model that combines the AR model and the MA model.
ARMA models are not only SARIMA models but also other models It is the basis of an evolving model and is very important.
MA model (moving average model) uses white noise as εt
It is a model expressed as. (In the case of primary process) What is white noise?
It was a time series that made up.
This model is an extension of white noise. Compared to the value shifted at one point
Since εt-1 is an intersection like, it has an autocorrelation. And this MA model A model that is affected by past errors and is affected by the error of the previous q values is referred to as MA (q).
The MA (q) model has q intersections in yt and yt-1. for that reason
It has an autocorrelation until the q period, and does not have an autocorrelation with the data before the q + 1 period.
The AR model (autoregressive model) is a model in which the value changes regularly.
It is a model expressed as. In this way yt to be represented by yt-1, which is temporarily offset from it Autocorrelation is expressed there. The AR model is a model that recursively estimates the value at a certain point in time from the past value. A model that predicts the next value using the previous p values is expressed as AR (p).
The AR and MA models do not compete with each other. Therefore, it is possible to formulate by combining these two. When the AR model of AR (p) and the MA model of MA (q) are combined, it can be expressed as ARMA (p, q).
The AR model is a model that recursively estimates the value at a certain point in time from the past value.
The MA model is a model that has the property that there is a correlation simply because it has an intersection with expressions with different time series.
The ARIMA model is an adaptation of the original series to the difference series to the ARMA model mentioned earlier. The ARMA model can only adapt to stationary processes, The ARIMA model is also adaptable to non-stationary processes (processes in which the mean and variance of the data are time dependent).
ARIMA model built with ARMA (p, q) when the difference from before d time is taken It is expressed as ARIMA (p, d, q). In this ARIMA model
・ P is autocorrelation
・ Induce d
・ Q is a moving average
Is called.