In this article, I'll write something like a standard for each data type in the Kaggle competition. Also, I think it would be good if it could be a hint when accuracy does not come out regardless of the competition.
This time we will touch on the following competition datasets. In the case of competition, there is no Kernel, so I will go with Notebooks together. Predict Future Sales Avocado Prices
Type: Regression Summary: Predict the number of products sold in the next month from the store name, product name, etc.
1.1Time series Basics : Exploring traditional TS
Analysis performed | result |
---|---|
Visualization of feature histogram | Discovered that there is a bias in the category |
Feature plots in chronological order | You can see that there is a change of trends every season |
ADF,KPSS,Stationarity with PP(Periodicity)check | Periodicityの有無を知る |
AR, MA,Try ARMA model etc. |
In addition, we introduce how to approach in the Hierarchical time series, bottom-up, top-down, and middle-out.
For the time being, it is important for time-series data to reduce the ups and downs depending on the time into mathematical formulas.
Forecasting Hierarchical Time Series using R
Measurement time series analysis with R: AR, MA, ARMA, ARIMA model, prediction
1.2 Feature engineering, xgboost
Analysis performed | result |
---|---|
Visualization of features | I found the noise data, so I interpolated with other numerical values. |
Feature engineering(Monthly average is calculated from daily sales, average of sales in a certain period is added as a feature amount, etc.) | Improved accuracy |
Predicted by xg boost |
It was a form in which information as a time series was steadily incorporated into features by feature engineering and predicted by xgboost. This steady feature engineering is amazing ~
I wrote an article about feature engineering in the past, so please check it out. Features Engineering Memorandum
1.3 A beginner guide for sale data prediction
Analysis performed | result |
---|---|
Visualization of features | Discovered seasonal patterns and decided to reduce them to features |
Predicted by LSTM |
1.4 1st place solution - Part 1 - "Hands on Data"
Analysis performed | result |
---|---|
Feature engineering(Grouping stores that haven't sold for months, discover duplicate store names, extract categories from names et) | Improved accuracy |
As you can see from here Data Visualization-> View Data Trends-> Feature Engineering-> Dive into Models
The routine is common.
2.Avocado Prices Next, I would like to introduce another one from the time series data set. From here, we will focus on the method used.
Type: Both regression and classification Summary: Historical avocado sales data
This kernel has too beautiful & detailed visualization. Impressed.
Method used |
---|
Smoothing Moving Average |
Seasonal Naive Method |
Drift Method |
ARIMA |
2.2 Explore avocados from all sides! This is also very beautiful visualization. This is a classification problem of whether it is an organic avocado or not.
Method used |
---|
logistic regression |
RandomForest |
KNeighborsClassifier |
2.3 Predicting prices of avocados
Tools used |
---|
prophet |
2.4 EDA + Lasso This is also included in the classification problem.
Method used |
---|
DecisionTree |
RandomForest |
KNeighbours |
SVM |
AdaBoostClassifier |
GradientBoostingClassifier |
Xgboost |
Lasso |
Ridge |
Bayesian Ridge |
ElasticNet |
HuberRegressor |
Feature engineering and visualization are important!
List of typical machine learning methods
Recommended Posts