Machine learning course memo
Introduction [P01]
What is machine learning?
Technical research area that allows machines to learn their behavior from data
-
Supervised learning
Learn the relationship between causative system X and result y
-
Unsupervised learning
Acquire data patterning and new forms of expression
Visualization of data patterns
Judged as above when none of the above applies (abnormality detection)
-
Reinforcement learning
Aim to acquire behavioral rules that maximize rewards
Differences between deep learning and traditional machine learning
Feature engineering was absorbed by the algorithm
Wave of democratization
-
DataRobot
https://www.datarobot.com/jp/
-
Amazon SageMaker
https://aws.amazon.com/jp/sagemaker/
-
Featuretools | An open source framework for automated feature engineering Quick Start
https://www.featuretools.com/
-
What to use with pandas
-
dataframe, series
-
join, merge
-
iloc,loc
-
map, apply, applymap
(2)
Free lunch theorem
The optimal algorithm depends on the data
Penalty term = harm
PCA: Principal component? ??
In the algorithm
Regularization: holdout
Before the algorithm
Dimensionality reduction (feature extraction, feature selection)
After the algorithm
Cross-validation: kfold
StandardScaler()
Subtract the mean and divide by the standard deviation
from sklearn.model_selection import cross_val_score, KFold
# build models
kf = KFold(n_splits=3, shuffle=True, random_state=0)
for pipe_name, est in pipelines.items():
cv_results = cross_val_score(est,
X, y,
cv=kf,
scoring='r2')
print('----------')
print('algorithm:', pipe_name)
print('cv_results:', cv_results)
print('avg +- std_dev', cv_results.mean(),'+-', cv_results.std())
Supervised learning (regression) [P02]
- Regression: Continuous amount of correct answer data
- Classification: Correct answer data is binary
Phase
- Modeling phase
- Scoring phase
To increase generalization ability
- amount of data
- Feature design that reflects business knowledge
- Data preprocessing and algorithm evaluation
Regression algorithm
- Least squares regression
- Ridge regression: Insert L2 regularization term
If the regularization parameter is large, the regression coefficient approaches 0
Dealing with overfitting
- Regularization
- L1 regularization (Lasso regression): Sparse (feature selection)
- L2 regularization (Ridge regression): Analytical solution found
- Dimensionality reduction (feature extraction)
- Dimensionality reduction (feature selection)
- Holdout / cross-validation
- Holdout method
- k-fold(Cross validation)
algorithm
- Decision tree
- Random forest
- Gradient boosting
Evaluation of regression model
- Absolute mean error (MAE): The mean of the absolute values of the difference between the measured and predicted values
- Mean squared error (MSE): Mean squared error of the difference between the measured and predicted values
- Median absolute error: Median absolute value of the difference between the measured value and the predicted value
Less susceptible to outliers
- R2 value: 1.0 with 0 error, 0.0 with mean value prediction equivalent. Bad and negative
Supervised learning (classification) [P03]
Classification algorithm
- K-Nearest Neighbors
- Logistic regression
- Maximum likelihood method
- neural network
- Formal neurons
- Simple Perceptron: Added weight learning
→ Linear separable problems cannot be solved
- Multilayer perceptron: Insert an intermediate layer between the input layer and the output layer
Weight learning by error back propagation method
- Support Vector Machine (SVM)
Classification model evaluation method
- Confusion Matrix
- False negative (FN): False positive for P as N
- False positive (FP): False positive for N as P
- Accuracy: Percentage of correct answers (TP + TN / ALL)
- Precision: Correct answer rate in positive (TP / TP + FP)
- Recall rate: Correct answer rate ((TP / TP + FN))
- AUC-ROC
- F value (F-measure): Harmonic mean of precision and recall
Data preprocessing and dimension reduction [P04]
Data preprocessing
- One-hot encoding
- Missing value completion
- Standardization
- Replace date and time with elapsed time
Dimensionality reduction
- Curse of dimensionality: The larger the number of dimensions, the larger the generalization error.
- Feature selection: Select important variables
- Rule selection: Missing, distributed
- Selection according to basic statistics: Selection according to classification and regression
- Model selection (RFE):
from sklearn.feature_selection import RFE
selector = RFE(RandomForestRegressor(n_estimators=100, random_state=42), n_features_to_select=5)
- Feature extraction: Drop the feature quantity dimension before inputting
- PCA (Principal Component Analysis)
Unsupervised learning [P06]
Clustering
Collect data with high similarity from data that has no correct answer
k-means
-
Clustering: unsupervised learning
-
Regression, classification: Supervised learning
-
K-means
Randomly set the center of gravity, cluster it, and repeat the resetting of the center of gravity.
Judging that the similarity is high when the distance between samples is high
-
DBSCAN
Classified into core points, border points, and noise points. Cluster close cores and border points cluster to nearby core points
Higher density between samples is judged to be more similar
-
Dimensionality reduction
-
Principal component analysis
Unsupervised learning from scratch! Principal component analysis and PCA basics to learn easily | AIZINE
https://aizine.ai/unsupervised-learning0531/
-
Hard clustering: Allocate one sample to one cluster
-
Soft clustering: Assign each sample to multiple clusters
Scheduled to be added
- Logistic regression
- Ridge regression / Lasso regression
- Residual error
- Regression coefficient
- ROC curve
- Ensemble learning