Until the last time, we have used the ARIMA model and ARIMAX model of time series analysis to forecast future sales.
-Challenge to future sales forecast: ① What is time series analysis? -Challenge to future sales forecast: ② Time series analysis using PyFlux -Challenge to future sales forecast: ③ Parameter tuning of PyFlux
However, the accuracy does not improve. Assuming that the consideration of seasonality is not enough, I would like to apply the ARIMA model = SARIMA model including seasonality next.
However, it seems that SARIMA cannot be used with the PyFlux used up to the last time, so "[Predict the transition of TV Asahi's viewing rate with the SARIMA model](https://qiita.com/mshinoda88/items/749131478bfefc9bf365#sarima%] E3% 83% A2% E3% 83% 87% E3% 83% AB% E5% AD% A3% E7% AF% 80% E8% 87% AA% E5% B7% B1% E5% 9B% 9E% E5% B8% B0% E5% 92% 8C% E5% 88% 86% E7% A7% BB% E5% 8B% 95% E5% B9% B3% E5% 9D% 87% E3% 83% A2% E3% 83% 87% E3% 83% AB) ”will be used as a reference to use Stats Models.
Google Colaboratory
As before [previous], the data uses daily sales and temperature (average, maximum, minimum) as explanatory variables.
date | Sales amount | Average temperature | Highest temperature | Lowest Temperature |
---|---|---|---|---|
2018-01-01 | 7,400,000 | 4.9 | 7.3 | 2.2 |
2018-01-02 | 6,800,000 | 4.0 | 8.0 | 0.0 |
2018-01-03 | 5,000,000 | 3.6 | 4.5 | 2.7 |
2018-01-04 | 7,800,000 | 5.6 | 10.0 | 2.6 |
Creating the original data is the same as Up to the last time. I will actually make a model immediately, but it can be used in the same way as pyflux.
We will also carry out parameter tuning using the previous. As SARIMA, the parameters (sp, sd, sq) considering seasonality are increasing.
You also need to set the following parameters: --enforce_stationarity: Whether to correct the stationarity of AR --enforce_invertibility: Whether to enforce MA repeatability
import pandas as pd
import statsmodels.api as sm
def optimisation_sarima(df, target):
df_optimisations = pd.DataFrame(columns=['p','d','q','sp','sd','sq','aic'])
max_p=4
max_d=4
max_q=4
max_sp=2
max_sd=2
max_sq=2
for p in range(0, max_p):
for d in range(0, max_d):
for q in range(0, max_q):
for sp in range(0, max_sp):
for sd in range(0, max_sd):
for sq in range(0, max_sq):
model = sm.tsa.SARIMAX(
df.kingaku, order=(p,d,q),
seasonal_order=(sp,sd,sq,4),
enforce_stationarity = False,
enforce_invertibility = False
)
x = model.fit()
print("AR:",p, " I:",d, " MA:",q, "SAR:",sp, "SI:",sd, "SMA:",sq," AIC:", x.aic)
tmp = pd.Series([p,d,q,sp,sd,sq,x.aic],index=df_optimisations.columns)
df_optimisations = df_optimisations.append( tmp, ignore_index=True )
return df_optimisations
df_optimisations = optimisation_sarima(df, 'Sales amount')
df_optimisations[df_optimisations.aic == min(df_optimisations.aic)]
This will display the parameter with the lowest AIC.
p | d | q | sp | sd | sq | aic |
---|---|---|---|---|---|---|
2.0 | 0.0 | 3.0 | 1.0 | 1.0 | 1.0 | 11056.356866 |
Specify that parameter in the model and rotate the model again to see the evaluation of the model.
sarima = sm.tsa.SARIMAX(
df.kingaku, order=(3,0,3),
seasonal_order=(1,1,1,4),
enforce_stationarity = False,
enforce_invertibility = False
).fit()
sarima.summary()
You should see a result similar to the following:
Statespace Model Results
Dep. Variable: kingaku No. Observations: 363
Model: SARIMAX(3, 0, 3)x(1, 1, 1, 4) Log Likelihood -5416.395
Date: Tue, 03 Mar 2020 AIC 10850.790
Time: 11:18:46 BIC 10885.537
Sample: 01-03-2018 HQIC 10864.619
- 12-31-2018
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
ar.L1 0.7365 0.132 5.583 0.000 0.478 0.995
ar.L2 -0.3535 0.165 -2.145 0.032 -0.677 -0.031
ar.L3 -0.5178 0.132 -3.930 0.000 -0.776 -0.260
ma.L1 -0.4232 0.098 -4.315 0.000 -0.615 -0.231
ma.L2 -0.0282 0.096 -0.295 0.768 -0.216 0.159
ma.L3 0.6885 0.068 10.140 0.000 0.555 0.822
ar.S.L4 0.4449 0.091 4.903 0.000 0.267 0.623
ma.S.L4 -0.7696 0.057 -13.547 0.000 -0.881 -0.658
sigma2 1.489e+12 6.05e-14 2.46e+25 0.000 1.49e+12 1.49e+12
Ljung-Box (Q): 777.09 Jarque-Bera (JB): 44.86
Prob(Q): 0.00 Prob(JB): 0.00
Heteroskedasticity (H): 1.09 Skew: 0.60
Prob(H) (two-sided): 0.63 Kurtosis: 4.27
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 3.14e+41. Standard errors may be unstable.
AIC doesn't seem to have changed that much. .. .. Let's look at the graph.
#Forecast
ts_pred = sarima.predict()
#Illustration of actual data and forecast results
plt.figure(figsize=(15, 10))
plt.plot(df.kingaku, label='DATA')
plt.plot(ts_pred, label='SARIMA', color='red')
plt.legend(loc='best')
Blue is a real number and red is a model value. Although it has become possible to predict the rise and fall in normal times, it is not possible to predict the extreme parts such as the end of the year. Also, the beginning of the year has become a strange number.
Is it a feeling of going 3 steps and going down 2.5 steps? Next, we are considering how to improve it. .. ..
Recommended Posts