Until the last time, we have forecasted future sales using the ARIMA model of time series analysis. I'm planning to make various efforts, but the parameters that can be adjusted are limited, and the accuracy does not improve.
-Challenge to future sales forecast: ① What is time series analysis? -Challenge to future sales forecast: ② Time series analysis using PyFlux -Challenge to future sales forecast: ③ Parameter tuning of PyFlux -Challenge to future sales forecast: ④ Time series analysis considering seasonality by Stats Models
Therefore, I would like to learn from the current trend of Deep Learning instead of the ARIMA model. However, it is difficult to start from scratch suddenly, so this time I would like to use Prophet, a library for time series analysis published by Facebook, which is often used as "speaking of time series analysis".
I was programming while looking at the following sites, but it didn't work as I expected in some places. Is the version of the library changed?
-Introduction to Prophet [Python] Facebook Time Series Prediction Tool -Future prediction of time series data using AI prophet on facebook
Or rather, Prophet was released in 2017. I lived without knowing that. .. ..
Google Colaboratory
As before [previous], the data uses daily sales and temperature (average, maximum, minimum) as explanatory variables.
date | Sales amount | Average temperature | Highest temperature | Lowest Temperature |
---|---|---|---|---|
2018-01-01 | 7,400,000 | 4.9 | 7.3 | 2.2 |
2018-01-02 | 6,800,000 | 4.0 | 8.0 | 0.0 |
2018-01-03 | 5,000,000 | 3.6 | 4.5 | 2.7 |
2018-01-04 | 7,800,000 | 5.6 | 10.0 | 2.6 |
The process of pulling data from BigQuery to Pandas is the same as before. However, since I am predicting the future, I am making the past 2 years (df) and the future 1 month (df_future).
You also need to convert the date item to datetime64 type after that. In addition, the date should be changed to ds and the predicted value (here the sales amount) should be changed to the variable name y.
import pandas as pd
query = """
SELECT *
FROM `myproject.mydataset.mytable`
WHERE CAST(Date AS TIMESTAMP) between CAST("{from_day}" AS TIMESTAMP) AND CAST("{to_day}" AS TIMESTAMP) ORDER BY p_date'
"""
df = pd.io.gbq.read_gbq(query.format(from_day="2017-01-01",to_day="2018-12-31"), project_id="myproject", dialect="standard")
df_future = pd.io.gbq.read_gbq(query.format(from_day="2019-01-01",to_day="2019-01-31"), project_id="myproject", dialect="standard")
from datetime import datetime
#Convert date item to datetime64 type
def strptime_with_offset(string, format='%Y-%m-%d'):
base_dt = datetime.strptime(string, format)
return base_dt
df['date'] = df['date'].apply(strptime_with_offset)
df.rename(columns={'Sales amount': 'y','date': 'ds'}, inplace=True)
Call Prophet and add various things to the model.
from fbprophet import Prophet
#The model is non-linear
model = Prophet(growth='logistic', daily_seasonality=False)
#You can specify a country to add holidays
model.add_country_holidays(country_name="JP")
#Add seasonality with monthly elements
model.add_seasonality(name='monthly', period=30.5, fourier_order=5)
#Variables to add to the forecast
features_list =["Average temperature","Highest temperature","Lowest Temperature"]
for f in features_list:
model.add_regressor(f)
#In the case of non-linearity, CAP is essential, so enter the upper limit value.
df['cap']=15000000
model.fit(df)
This will train the model. You can easily add items, so it seems good to learn while adding and subtracting various elements.
Then apply the resulting model to future data.
#How far do you predict? Specify 30 days here
future = model.make_future_dataframe(periods=30, freq='D')
future["cap"]=15000000
#Since we need variables to add to the forecast such as temperature, df_Predict after merging with future
future=pd.merge(future, df_future, on="ds")
df_forecast = model.predict(future)
The prediction result is now stored in df_forecast. Looking at the contents, it seems that it is entered with a value of yhat. Furthermore, it predicts by width as yhat_lower and yhat_upper. In addition, various trends, seasonality, temperature, etc. are analyzed.
Let's graph the analysis results in an easy-to-understand manner. You can compare the sales forecast and the actual results for the past month.
from matplotlib import pyplot as plt
% matplotlib inline
df_output=pd.merge(df_forecast, df_future, on="ds")
#For some reason, in the current version, an error occurred without the following
pd.plotting.register_matplotlib_converters()
df_output.plot(figsize=(18, 12), x="ds", y=["yhat","y"])
The forecast (yhat) is slightly higher, but it seems that the future forecast shows a fairly good trend by raising and lowering.
You can also extract and see the trend and periodicity.
model.plot_components(df_forecast)
plt.show()
――Holiday is Coming-of-Age Day. It is piercing. ――At Weekly, weekends on Saturdays and Sundays are still expensive. ――Monthly is squishy. Does that mean that the end of the month and the beginning of the month are high?
It wasn't straightforward, such as the item names being ds and y, and the programs of the pioneers in some places causing errors, but when it was completed, it was very simple to move.
The formula is not included in the program, but when comparing y and yhat, the monthly error is within about 10%, so I feel that it can be used sufficiently.
This time, the sales amount of the entire store was used, but in the future, I would like to find something with higher accuracy, such as the number of visitors and the sales amount of only a specific category.
Recommended Posts