Machine Learning: Supervised --AdaBoost

Target

Understand the AdaBoost formula and try it with scikit-learn.

It is assumed that you have already learned calculus.

theory

AdaBoost is a type of ensemble learning called boosting that learns multiple weak classifiers step by step.

Boosting

Boosting learns multiple weak classifiers and the next weak classifier based on the learning result of the previous weak classifier. This allows the next weak classifier to focus on the data that the previous weak classifier did not train well.

109_boosting.png

Whereas bagging trains multiple weak classifiers independently in parallel, boosting trains multiple weak classifiers in series.

Classification by AdaBoost

A typical boosting algorithm is Adaptive Boosting (AdaBoost). Adaboost weights correctly identified training data with small values and misidentified training data with large values, so that the weak classifiers in the latter half focus on error-prone training data. To be able to learn.

Derivation of AdaBoost

Here, the number of training data is $ N $, the number of weak classifiers is $ M $, the training data is $ x = (x_1, ..., x_n) $, and the correct answer data is $ t = (t_1, .. .., t_n) $ However, $ t \ in \ {-1, 1 \} $, the weight of the training data is $ w ^ m_i (i = 1, ..., N, m = 1, .. Let ., M) $, and the weak classifier be $ y_m (x) = \ {-1, 1 \} $.

AdaBoost sequentially minimizes the exponential error function $ E $ in the formula below from $ m = 1 $ to $ M $.

E = \sum^N_{i=1} \exp \left( -t_i f_m(x_i) \right)

Where $ f_m (x) $ is defined as a linear combination of the linear combination coefficient $ \ alpha_m $ and the weak classifier $ y_m (x) $ in the formula below.

f_m(x) = \sum^m_{j=1} \alpha_j y_j(x)

The purpose of AdaBoost training is to find the coefficient $ \ alpha_m $ that minimizes the exponential error function $ E $ and the weight $ w ^ m_i $ of the training data.

Where the coefficients up to $ m-1 $ th $ \ alpha_1, ..., \ alpha_ {m-1} $ and the weak classifier $ y_1 (x), ..., y_ {m-1} (x) ) Given that $ is sought, consider the $ m $ th coefficient and weak classifier.

\begin{align}
E &= \sum^N_{i=1} \exp \left( -t_i f_{m-1}(x_i) - t_i f_m(x_i) \right) \\
&= \sum^N_{i=1} \exp \left( -t_i f_{m-1}(x_i) - t_i \alpha_m y_m(x_i) \right) \\
&= \sum^N_{i=1} w^m_i \exp \left( - t_i \alpha_m y_m(x_i) \right)
\end{align}

Will be. Here, $ w ^ m_i = \ exp \ left (-t_i f_ {m-1} (x_i) \ right) $, and when treated as a constant in sequential optimization, the weak classifier $ y_m (x) $ $ Y_m (x_i) = t_i $ to $ y_m (x_i) t_i = 1 $ if identified correctly, $ y_m (x_i) \ neq t_i $ to $ y_m (x_i) t_i = -1 if identified incorrectly Because it's $

\begin{align}
E &= \sum_{y_m(x_i) = t_i} w^m_i \exp(-\alpha_m) + \sum_{y_m(x_i) \neq t_i} w^m_i \exp(\alpha_m) \\
&= \sum^N_{i=1} w^m_i \exp (-\alpha_m) + \sum_{y_m(x_i) \neq t_i} w^m_i \left\{ \exp(\alpha_m) - \exp(-\alpha_m) \right\} \
\end{align}

If $ A = \ sum_ {y_m (x_i) \ neq t_i} w ^ m_i, B = \ sum ^ N_ {i = 1} w ^ m_i $

E = \left\{ \exp(\alpha_m) - \exp(-\alpha_m) \right\} A + \exp(-\alpha_m) B

It will be. In the minimization for the coefficient $ \ alpha_m $ you want to find, if you differentiate for $ \ alpha_m $ and solve 0,

\frac{\partial E}{\partial \alpha_m} = \left\{ \exp(\alpha_m) + \exp(-\alpha_m) \right\} A - \exp(-\alpha_m) B = 0

Therefore,

\exp(2 \alpha_m) = \frac{B}{A} - 1

Therefore,

\begin{align}
\alpha_m &= \frac{1}{2} \ln \left( \frac{B}{A} - 1 \right) = \frac{1}{2} \ln \frac{1 - A/B}{A/B} \\
&= \frac{1}{2} \ln \frac{1 - \epsilon_m}{\epsilon_m}
\end{align}

It will be. However,

\epsilon_m = \frac{A}{B} = \frac{\sum_{y_m(x_i) \neq t_i} w^m_i}{\sum^n_{i=1} w^m_i}

And $ \ epsilon_m $ will be called the weighted error function.

On the other hand, the following formula is used to update the weight of the $ i $ th training data.

w^{m+1}_i = w^m_i \exp \left( - t_i \alpha_m y_m(x_i) \right)

Here, if the function that becomes 0 when correctly identified and 1 when incorrectly identified is $ I (y_m (x_i) \ neq t_i) $,

t_i y_m(x_i) = 1 - I(y_m(x_i) \neq t_i)

So

\begin{align}
w^{m+1}_i &= w^m_i \exp \left( - \alpha_m + I(y_m(x_i) \neq t_i) \right) \\
&= w^m_i \exp(-\alpha_m) \exp \left( \alpha_m I( y_m(x_i) \neq t_i) \right)
\end{align}

And since $ \ exp (-\ alpha_m) $ is common to all data, if we ignore it by normalization, the update expression of the training data weight $ w ^ m_i $ is

w^{m+1}_i = w^m_i \exp \left( I( y_m(x_i) \neq t_i) \right)

It will be.

Learning AdaBoost

The learning algorithm of AdaBoost is as follows.

First, initialize the training data weight $ w ^ 1_i $ as follows.

w^1_i = \frac{1}{N}

Then repeat the following steps from the number of weak classifiers $ m = 1 $ to $ M $.

  1. Learn the weak classifier $ y_m $ using the training data weight $ w ^ m_i $.
  2. Find the weighted error function $ \ epsilon_m $ from the weak classifier $ y_m $.
  3. Find the linear combination coefficient $ \ alpha_m $.
  4. Update the training data weight $ w ^ m_i $.

Finally, when identifying the new data $ x $, use the sign function $ {\ rm sign} $ and use the following formula to identify it.

\hat{t} = {\rm sign} \left( \sum^M_{m=1} \alpha_m y_m (x) \right)

Since AdaBoost is originally a binary classification model, it is necessary to prepare $ K-1 $ classifiers for multiclass classification.

Implementation

Execution environment

hardware

-CPU Intel (R) Core (TM) i7-6700K 4.00GHz

software

・ Windows 10 Pro 1909 ・ Python 3.6.6 ・ Matplotlib 3.3.1 ・ Numpy 1.19.2 ・ Scikit-learn 0.23.2

Program to run

The implemented program is published on GitHub.

adaboost.py


result

Classification by AdaBoost

Here is the result of applying AdaBoost to the breast cancer dataset as before.

Accuracy 97.37%
Precision, Positive predictive value(PPV) 97.06%
Recall, Sensitivity, True positive rate(TPR) 98.51%
Specificity, True negative rate(TNR) 95.74%
Negative predictive value(NPV) 97.83%
F-Score 97.78%

The figure below shows the identification boundaries when performing a multiclass classification on an iris dataset.

109_adaboost_classification.png

Regression with AdaBoost

The data of the regression problem is a sine wave plus a random number.

109_adaboost_regression.png

reference

1.11.3. AdaBoost

  1. Yoav Freund and Robert E. Schapire. "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting", Journal of computer and system sciences 55.1 (1997): pp. 119-139.
  2. Ji Zhu, Hui Zou, Saharon Rosset, and Trevor Hastie. "Multi-class AdaBoost", Statistics and its Interface, 2(3), 2009, pp. 349-360.
  3. Harris Drucker. "Improving Regressors using Boosting Techniques", ICML, 1997, pp. 107-115.
  4. Yuzo Hirai. "First Pattern Recognition", Morikita Publishing, 2012.

Recommended Posts

Machine Learning: Supervised --AdaBoost
Machine learning ⑤ AdaBoost Summary
Machine learning
Machine Learning: Supervised --Random Forest
Machine Learning: Supervised --Support Vector Machine
Supervised machine learning (classification / regression)
Machine Learning: Supervised --Decision Tree
Machine Learning: Supervised --Linear Discriminant Analysis
Supervised learning (classification)
[Memo] Machine learning
Machine learning classification
Machine Learning sample
[Machine learning] Supervised learning using kernel density estimation
Machine learning tutorial summary
About machine learning overfitting
Machine learning logistic regression
Machine learning support vector machine
[Machine learning] Supervised learning using kernel density estimation Part 2
[Machine learning] Supervised learning using kernel density estimation Part 3
Studying Machine Learning ~ matplotlib ~
Machine learning linear regression
Machine learning course memo
Machine learning library dlib
Machine learning (TensorFlow) + Lotto 6
Somehow learn machine learning
Supervised learning (regression) 1 Basics
Python: Supervised Learning (Regression)
Machine learning library Shogun
Machine learning rabbit challenge
Introduction to machine learning
Python: Supervised Learning (Classification)
Machine Learning: k-Nearest Neighbors
What is machine learning?
Machine learning model considering maintainability
Machine learning learned with Pokemon
Data set for machine learning
Japanese preprocessing for machine learning
Machine learning in Delemas (practice)
An introduction to machine learning
Machine learning / classification related techniques
Basics of Machine Learning (Notes)
Python: Supervised Learning: Hyperparameters Part 1
Machine learning beginners tried RBM
[Machine learning] Understanding random forest
Machine learning with Python! Preparation
Machine Learning Study Resource Notepad
Machine learning ② Naive Bayes Summary
Supervised Learning 3 Hyperparameters and Tuning (2)
Understand machine learning ~ ridge regression ~.
Machine learning article summary (self-authored)
About machine learning mixed matrices
Supervised learning 1 Basics of supervised learning (classification)
Practical machine learning system memo
Machine learning Minesweeper with PyTorch
Machine learning environment construction macbook 2021
Build a machine learning environment
Python Machine Learning Programming> Keywords
Machine learning algorithm (simple perceptron)
Python: Supervised Learning: Hyperparameters Part 2
Used in machine learning EDA
Supervised learning (regression) 2 Advanced edition