This time it will be the heart of machine learning I will post about scikit-learn.
Read as Cykit Learn. A Python machine learning library.
About the analysis (classification, regression, etc.) that you want to perform A cheat sheet that allows you to easily select a model.
Determine which class it belongs to.
SGD(stochastic gradient descent) For more than 100,000 data Linear classification method
For more than 100,000 data If SGD doesn't work Non-linear classification method
Linear SVC For less than 100,000 Linear classification method
For less than 100,000 If Linear SVC doesn't work This is a non-linear classification method.
For text data
Prediction of target value
SGD(stochastic gradient descent) For more than 100,000 data Linear regression analysis method
LASSO、ElasticNet For less than 100,000 When some of the explanatory variables are important Regression analysis method
Ridge、Liner SVR For less than 100,000 When all the explanatory variables are important Regression analysis method
If Ridge or Liner SVR doesn't work Non-linear regression analysis method
Things to divide according to some rules
KMeans When it is possible to decide in advance how many clusters to divide into Clustering analysis method
MiniBatch For more than 100,000 data A method of learning while dividing data
If KMeans doesn't work Non-linear clustering analysis method.
MeanShift、VBGMM When it is not possible to decide in advance how many clusters to divide into It is a clustering analysis method.
In the pretreatment process We will do it to improve learning efficiency
PCA, kernel PCA, Isomap, Spectral Embedding, etc.
Adjustment values such as learning methods are called "hyperparameters".
There are methods such as grid search and cross validation.
Recommended Posts