Machine learning course memo

Introduction [P01]

What is machine learning?

Technical research area that allows machines to learn their behavior from data

Supervised learning Learn the relationship between causative system X and result y
Unsupervised learning Acquire data patterning and new forms of expression Visualization of data patterns Judged as above when none of the above applies (abnormality detection)
Reinforcement learning Aim to acquire behavioral rules that maximize rewards

Differences between deep learning and traditional machine learning

Feature engineering was absorbed by the algorithm

Wave of democratization

DataRobot
https://www.datarobot.com/jp/
Amazon SageMaker
https://aws.amazon.com/jp/sagemaker/
Featuretools | An open source framework for automated feature engineering Quick Start
https://www.featuretools.com/
What to use with pandas
dataframe, series
join, merge
iloc,loc
map, apply, applymap

(2)

Free lunch theorem The optimal algorithm depends on the data

Penalty term = harm

PCA: Principal component? ??

In the algorithm Regularization: holdout Before the algorithm Dimensionality reduction (feature extraction, feature selection) After the algorithm Cross-validation: kfold

StandardScaler() Subtract the mean and divide by the standard deviation

    from sklearn.model_selection import cross_val_score, KFold
# build models
kf = KFold(n_splits=3, shuffle=True, random_state=0)
for pipe_name, est in pipelines.items():
  cv_results = cross_val_score(est,
                                  X, y,
                                 cv=kf,
                                 scoring='r2')
    print('----------')
    print('algorithm:', pipe_name)
    print('cv_results:', cv_results)
    print('avg +- std_dev', cv_results.mean(),'+-', cv_results.std())

Supervised learning (regression) [P02]

Regression: Continuous amount of correct answer data
Classification: Correct answer data is binary

Phase

Modeling phase
Scoring phase

To increase generalization ability

amount of data
Feature design that reflects business knowledge
Data preprocessing and algorithm evaluation

Regression algorithm

Least squares regression
Ridge regression: Insert L2 regularization term If the regularization parameter is large, the regression coefficient approaches 0

Dealing with overfitting

Regularization
L1 regularization (Lasso regression): Sparse (feature selection)
L2 regularization (Ridge regression): Analytical solution found
Dimensionality reduction (feature extraction)
Dimensionality reduction (feature selection)
Holdout / cross-validation
Holdout method
k-fold(Cross validation)

algorithm

Decision tree
Random forest
Gradient boosting

Evaluation of regression model

Absolute mean error (MAE): The mean of the absolute values of the difference between the measured and predicted values
Mean squared error (MSE): Mean squared error of the difference between the measured and predicted values
Median absolute error: Median absolute value of the difference between the measured value and the predicted value Less susceptible to outliers
R2 value: 1.0 with 0 error, 0.0 with mean value prediction equivalent. Bad and negative

Supervised learning (classification) [P03]

Classification algorithm

K-Nearest Neighbors
Logistic regression
Maximum likelihood method
neural network
Formal neurons
Simple Perceptron: Added weight learning → Linear separable problems cannot be solved
Multilayer perceptron: Insert an intermediate layer between the input layer and the output layer Weight learning by error back propagation method
Support Vector Machine (SVM)

Classification model evaluation method

Confusion Matrix
False negative (FN): False positive for P as N
False positive (FP): False positive for N as P
Accuracy: Percentage of correct answers (TP + TN / ALL)
Precision: Correct answer rate in positive (TP / TP + FP)
Recall rate: Correct answer rate ((TP / TP + FN))
AUC-ROC
F value (F-measure): Harmonic mean of precision and recall

Data preprocessing and dimension reduction [P04]

Data preprocessing

One-hot encoding
Missing value completion
Standardization
Replace date and time with elapsed time

Dimensionality reduction

Curse of dimensionality: The larger the number of dimensions, the larger the generalization error.
Feature selection: Select important variables
Rule selection: Missing, distributed
Selection according to basic statistics: Selection according to classification and regression
Model selection (RFE):

  from sklearn.feature_selection import RFE
  selector = RFE(RandomForestRegressor(n_estimators=100, random_state=42), n_features_to_select=5)

Feature extraction: Drop the feature quantity dimension before inputting
PCA (Principal Component Analysis)

Unsupervised learning [P06]

Clustering

Collect data with high similarity from data that has no correct answer k-means

Clustering: unsupervised learning
Regression, classification: Supervised learning
K-means Randomly set the center of gravity, cluster it, and repeat the resetting of the center of gravity. Judging that the similarity is high when the distance between samples is high
DBSCAN
Classified into core points, border points, and noise points. Cluster close cores and border points cluster to nearby core points Higher density between samples is judged to be more similar
Dimensionality reduction
Principal component analysis Unsupervised learning from scratch! Principal component analysis and PCA basics to learn easily | AIZINE https://aizine.ai/unsupervised-learning0531/
Hard clustering: Allocate one sample to one cluster
Soft clustering: Assign each sample to multiple clusters

Scheduled to be added

Logistic regression
Ridge regression / Lasso regression
Residual error
Regression coefficient
ROC curve
Ensemble learning