In this article, I'll write something like a standard for each data type in the Kaggle competition. Also, I think it would be good if it could be a hint when accuracy does not come out regardless of the competition.

alt

This time we will touch on the following competition datasets. In the case of competition, there is no Kernel, so I will go with Notebooks together. Predict Future Sales Avocado Prices

Type: Regression Summary: Predict the number of products sold in the next month from the store name, product name, etc.

1.1Time series Basics : Exploring traditional TS

Analysis performed	result
Visualization of feature histogram	Discovered that there is a bias in the category
Feature plots in chronological order	You can see that there is a change of trends every season
ADF,KPSS,Stationarity with PP(Periodicity)check	Periodicityの有無を知る
AR, MA,Try ARMA model etc.

In addition, we introduce how to approach in the Hierarchical time series, bottom-up, top-down, and middle-out.

For the time being, it is important for time-series data to reduce the ups and downs depending on the time into mathematical formulas.

Relation

Forecasting Hierarchical Time Series using R

Measurement time series analysis with R: AR, MA, ARMA, ARIMA model, prediction

1.2 Feature engineering, xgboost

Analysis performed	result
Visualization of features	I found the noise data, so I interpolated with other numerical values.
Feature engineering(Monthly average is calculated from daily sales, average of sales in a certain period is added as a feature amount, etc.)	Improved accuracy
Predicted by xg boost

It was a form in which information as a time series was steadily incorporated into features by feature engineering and predicted by xgboost. This steady feature engineering is amazing ~

Relation

I wrote an article about feature engineering in the past, so please check it out. Features Engineering Memorandum

1.3 A beginner guide for sale data prediction

Analysis performed	result
Visualization of features	Discovered seasonal patterns and decided to reduce them to features
Predicted by LSTM

1.4 1st place solution - Part 1 - "Hands on Data"

Analysis performed	result
Feature engineering(Grouping stores that haven't sold for months, discover duplicate store names, extract categories from names et)	Improved accuracy

Summary

As you can see from here Data Visualization-> View Data Trends-> Feature Engineering-> Dive into Models

The routine is common.

2.Avocado Prices Next, I would like to introduce another one from the time series data set. From here, we will focus on the method used.

Type: Both regression and classification Summary: Historical avocado sales data

2.1 Price of Avocados || Pattern Recognition Analysis

This kernel has too beautiful & detailed visualization. Impressed.

Method used
Smoothing Moving Average
Seasonal Naive Method
Drift Method
ARIMA

2.2 Explore avocados from all sides! This is also very beautiful visualization. This is a classification problem of whether it is an organic avocado or not.

Method used
logistic regression
RandomForest
KNeighborsClassifier

2.3 Predicting prices of avocados

Tools used
prophet

2.4 EDA + Lasso This is also included in the classification problem.

Method used
DecisionTree
RandomForest
KNeighbours
SVM
AdaBoostClassifier
GradientBoostingClassifier
Xgboost
Lasso
Ridge
Bayesian Ridge
ElasticNet
HuberRegressor

Summary

Feature engineering and visualization are important! alt

Link

List of typical machine learning methods

Kaggle Kernel Method Summary [Table Time Series Data]

Relation

Relation

Summary

2.1 Price of Avocados || Pattern Recognition Analysis

Summary

Link