Recently, the retail industry is also whispering with big data and AI, and there are various consultations from each department every day. Especially recently, there are many future cases such as "I want you to predict next month's sales", "How much should I sell next week", "Should I do additional sales promotion next month" from the store department. About sales.
Previously, the target was 105% compared to the same month of the previous year, but the declining birthrate and aging population, inbound weather, abnormal weather, and other changes in the world have made the year-on-year comparison useless. Therefore, I would like to know how much it will sell and how much it will not sell if it is done as usual at the store, and use it as a standard for thinking about how much to add to it at events and advertisements.
I'm working on data now, but I'm a crunchy human being, so I'm not familiar with the complex methods of statistics. So, at the beginning, I tried to make predictions by using regression analysis with information on weather, sales promotion measures, and surrounding events, but the accuracy did not improve at all. .. ..
At that time, when I researched various things, I learned that there is a "time series analysis" for predicting stocks.
"Statistics to understand all humankind", "[Predict TV Asahi's audience rate transition with SARIMA model](https://s I would like to organize the time series analysis with my understanding while referring to ": //qiita.com/mshinoda88/items/749131478bfefc9bf365)". (I'm sorry if I made a mistake. Please tell me without any difficult formulas ...)
In the regression analysis I originally made, I was trying to explain sales with a completely different variable:
Earnings= a{1} *temperature+ a{2} *Promotional expenses+・ ・ ・
However, if one day's sales are 10 million yen, how much will the next day's sales be? It won't be a million yen. On the contrary, it will not be 100 million yen. Probably, 12 million yen or 8 million yen, I think that it will not be so much off the sales of the previous day.
Therefore, the method is to improve the accuracy by using the past sales as explanatory variables as follows.
Earnings{n} = a{1} * Earnings{n-1} + a{2} * Earnings{n-2} +・ ・ ・
It seems that this is called AR (autoregressive).
For the autoregressive model of 1, if the sales of last month are higher than the original, it is considered that there was a pre-emption of sales, and the possibility that sales will decrease this month is considered. This can be expressed as:
Earnings{n} = b{1} *error{n} + b{2} *error{n-1} +・ ・ ・
It seems that this is called MA (moving average).
It's easy if the cycle is repeated, but that's not the case with the strict time series of reality. It seems to be called "non-stationary process" in difficult words.
It seems that we should consider the uptrend and downtrend as a medium- to long-term trend rather than a short-term trend.
These 1 to 3 are collectively called the ARIMA (Auto Regressive Integrated Moving Average) model. The feeling that AR and MA are united is cool.
Even if you do so far, the accuracy will not improve. But that's what retailers know. There should be seasonality, such as sales not increasing every year in February and September, but I haven't taken that into consideration.
Even though it is seasonal, I think there are various cycles.
--Days of the week: On Saturdays and Sundays, sales increase at stores that stock up on holidays --Days in the month: After the 25th or payday, a little expensive items will sell and sales will increase. --Month of the year: As mentioned above, sales will decline in February and September.
It seems that the SARIMA model can take these cycles into consideration.
So far we have seen the time series elements, but I would like to incorporate sudden elements as well.
――Weather: Not only does it rain, but there are also recent abnormal weather. —— Event: If there is an athletic meet or festival near the store, that alone will greatly increase sales. ――Competition: If a rival store is opened nearby, sales will drop by a certain amount of 10 to 10% after that.
It seems that the ARIMAX model considers these external variables.
-State-space model by Python A model with + data interpretation added to the ARIMA model is called a state space model.
-Time series data prediction library --PyFlux- It seems that there is a library called PyFlux that can implement ARIMA, ARIMAX, and state space models.
-I understand this time RNN, LSTM edition -Forecasting airline passenger numbers next month with RNN These are like neural networks
This time, I'm sorry for all the letters. From the next time onward, I will actually try time series analysis.
Recommended Posts