If I could get 27th place in my life, M5 saved me. 27th place is different because the format is different ... [Original song Yorushika / August, certain, moonlight](https://music.apple.com/jp/album/%E5%85%AB%E6%9C%88-%E6%9F%90-%E6% 9C% 88% E6% 98% 8E% E3% 81% 8B% E3% 82% 8A / 1455955689? I = 1455955692)
We participated in the competition Kaggle M5 Forecasting Accuracy held from March to June 2020. I submitted a model equivalent to 27th place (top 0.4%) </ font>. I made a mistake in the format and it became lowest </ font>, so I mourn the solution. If you know me next time, please treat me as a comforting fee. Now, enjoy the story of how we fell into ** hell **.
Kaggle M5 Forecasting: Wal-Mart (Supermarket) Sales Forecasting Competition Forecast of sales (sales volume) for each product (3049 types) in the future 28 days from the sales data of the past 1913 days. The target stores are 10 stores in California, Texas, and Wisconsin. Given the data -Past sales (by ID, by store, by item, etc.) -Price transition, -Calendar (holidays, etc.) The evaluation index is basically RMSE (it was a little technical WRMSSE, but this time it is omitted)
Create LGBM for each store and for each forecast date (see the figure below)
Prediction of time series data is not suitable for machine learning (reference), so at first it was a statistical model I tried LSTM. It didn't work for some reason. Therefore, LGBM was created for each region and each forecast date so that the dynamics that control the data are as uniform as possible.
-Reason for choosing LGBM I think that models created for time series prediction, such as statistical models and LSTMs, are basically learning large dynamics. But what I want to predict this time is each product, that is, small dynamics. For example, even if you can predict the movement of the Japanese as a whole, it is difficult to predict each individual (I think Japanese people like ramen, but I don't know if you like ramen. , I like dandan noodles). Therefore, I decided to ask LGBM, which has high expressiveness.
When I visualized the sales transition of each store by referring to EDA notebook, I noticed that the movement was quite different for each store. ..
In addition, when clustering using UMAP, FOODS had almost the same distribution, while HOBBBIES showed remarkable regional differences. From these results, we decided to separate the models because there are some that sell and some that do not sell for each store, and the dynamics that govern them are different. California, Texas, and Wisconsin, where the stores are located, are geographically separated, and I thought it was reasonable that they sold differently.
Whether a recursive model is better or a daily prediction is better, as done in Discussion It was an important point. We have simply chosen a safe path that does not miss a lot. When actually making a recurrence model, lag1 was completely negative when looking at permutation importance.
Basic features (moving average, max / min for a certain period, etc.) ** ・ The number of days that have passed since the price of the product increased or decreased ** ** ・ The day when sales were recorded for the first time ** Data before this was not included in the training. ** ・ Ordered TS ** It's like a non-leakage Target encoding for time series data **-Maximum and minimum sales for a specific period ** I made this because I wanted to teach the learning model the concept of time. **-Ratio of 0 to 10 in past sales ** I created it to express the probability distribution of sales at that time. Also, if the sales are 0, is it out of stock? It suddenly appeared, so I thought it was necessary to include statistics related to 0, so I introduced it.
I tried to put in a statistically meaningful feature in addition ・ Teammates: If you know what is related to purchasing motivation, you should know the sales ・ Writer: Things related to time to give information that LGBM does not know I was thinking about it. Actually, I tried various things such as denoising processing, waveform complexity, and features in Fourier space, but there were not many effective ones.
We used the data from the last 3 months. I was careful not to adopt the ones with partially improved CV, but to make them robustly. I didn't count on Darker magic or the Public leaderboard because I suspected overfitting.
Iwamo 8:49 PM I laugh because my head isn't working too much Teammate 9:10 PM I asked for confirmation Iwamo 9:10 PM ** It looks okay ** </ font>
I will post the score of the original model because it is unskilled.
I feel sorry for my teammates. It was confirmed at the end ... Actually, I've been learning kaggle for about half a year for the first time, but I think I've learned a lot. I'm really disappointed, but I will continue to do my best.
Recommended Posts