Continuing from the previous time, we will use TOPIX historical data and monthly foreign visitor data.
In the following predictions, the prediction that minimizes the mean square error (MSE) is the optimum prediction.
At this time, the optimal one-term forecast is
The above is the point prediction of the AR process, but the interval prediction is as follows.
Let's consider a 95% interval forecast one period ahead.
When $ y \ sim N (\ mu, \ sigma ^ 2) $,
In general, it is difficult to obtain the h-term MSE of $ AR (p) $, and a method of approximating by simulation is used.
If there are an infinite number of observations, the invertable MA process is
On the other hand, even if there are only a finite number of observed values, predictions beyond the $ q $ period are expected values of the process, and MSE is the variance of the process. Forecasts up to the $ q $ period are generally made assuming $ \ epsilon = 0 $ before the sample period.
The ARMA process prediction is a combination of the AR process and MA process predictions.
Below, we will try to predict the ARMA process using the data on the number of foreign visitors to Japan, which was also used in Part 2 (https://qiita.com/asys/items/622594cb482e01411632).
In Part 2, I know that $ p = 4, q = 1 $ looks good, so I will use it. This time, out of the total 138 data, the first 100 will be used for model construction, and the remaining 38 will be predicted. You can easily get a prediction by using the predict function as follows.
arma_model = sm.tsa.ARMA(v['residual'].dropna().values[:100], order=(4,1))
result = arma_model.fit()
pred = result.predict(start=0,end=138)
arma_model = sm.tsa.ARMA(v['residual'].dropna().values[:100], order=(4,1))
result = arma_model.fit()
pred = result.predict(start=0,end=138)
pred[:100] = np.nan
plt.figure(figsize=(10,4))
plt.plot(v['residual'].dropna().values, label='residual')
plt.plot(result.fittedvalues, label='ARMA(4,1)')
plt.plot(pred, label='ARMA(4,1) pred', linestyle='dashed', color='magenta')
plt.legend()
plt.grid()
plt.title('ARMA(4,1) prediction')
plt.show()
The prediction is a combination of the AR process and the MA process, and it is intuitively understood and consistent that the accuracy decreases as the prediction period becomes longer. On the other hand, the prediction accuracy for the first and second terms is not bad. How to use forecasts will vary greatly depending on the purpose, but for example, in the stock market, the number of monthly foreign visitors to Japan will affect the subsequent price movements of inbound stocks, so predict the number before publication and position It can be used to take. In this case, all that is needed is a forecast one period ahead, and we are only interested in the accuracy of the forecast one period ahead. So, if you look at the prediction accuracy one period ahead,
res_arr = []
for i in range(70,138):
arma_model = sm.tsa.ARMA(v['residual'].dropna().values[:i], order=(4,1))
result = arma_model.fit()
pred = result.predict(i)[0]
res_arr.append([v['residual'].dropna().values[i], pred])
res_arr = np.array(res_arr)
sns.regplot(x=res_arr[:,0], y=res_arr[:,1])
plt.xlabel('observed')
plt.ylabel('predicted')
plt.show()
Therefore, although there is a positive correlation, there is a large variation, and it is a level that hesitates to put it into actual battle.
Recommended Posts