Machine learning algorithm (generalization of linear regression)

Introduction

Step-by-step on the theory, implementation in python, and analysis using scikit-learn about the algorithm previously taken up in "Classification of Machine Learning" I will study with. I'm writing it for personal learning, so I'd like you to overlook any mistakes.

So far, we have seen "simple regression" and "multiple regression", but both have talked in the same field of linear regression. This time, I would like to summarize the "** linear basis regression model " that generalizes linear regression and the " gradient descent method **" for optimizing the loss function. The following sites were referred to this time.

Basic

To draw an approximate curve for the data sequence, the simple regression model is $ y = Ax + B $ multiple regression model

y=w_0x_0+w_1x_1+\cdots+w_nx_n

It was to be approximated by. Furthermore, it can be seen that for simple regression, only two items of the multiple regression equation were used.

Now, if the weight of each term is $ (w_0, w_1, \ cdots, w_n) $, the function of the model can actually be anything, and if this is $ y = \ phi (x) $,

y(\boldsymbol{x}, \boldsymbol{w}) = \sum_{j=0}^{M-1}w_j\phi_{j}(\boldsymbol{x})

It is expressed as. $ \ boldsymbol {w} = (w_0, w_1, \ cdots, w_ {M-1}) ^ T $, $ \ boldsymbol {\ phi} = (\ phi_0, \ phi_1, \ cdots, \ phi_ {M-1 }) ^ T $. If $ \ phi_0 = 1 $ (intercept term),

y(\boldsymbol{x}, \boldsymbol{w}) = \boldsymbol{w}^T\phi(x)

become. This $ \ phi (x) $ is called ** basis set **.

Various basis functions

The generalized expression means that linear regression means finding a sequence of coefficients $ \ boldsymbol {w} $ that best represents a given sequence of data by combining some basis functions. ..

scikit-learn allows you to use various basis functions for regression.

Find the regression coefficient

For simple regression and multiple regression, we found a coefficient that minimizes the sum of squared residuals. Although it was possible to find w mathematically with simple regression, it is often very difficult to find a solution analytically when the basis function is complicated or the data has many dimensions. In such cases, it is necessary to find the coefficient approximately. At that time, "** gradient descent method **" is used. Literally, it is a method to find the optimum value while going down the slope (gradient).

Think about how to find the coefficients, including how to solve them mathematically. It is described in detail below.

Mathematical solution

It is a method to find a solution by formula transformation as described in simple regression and multiple regression. It is a method of solving simultaneous equations from square completion and partial differentiation. If the formula is simple, there is no problem, but if the model is complicated, there are cases where it cannot be solved.

Gradient descent

The gradient method is literally a way to go down the gradient of the loss function. The value of the loss function needs to be small in order to find the optimum parameter, but it is an image of going down a slope toward a smaller value.

The steepest descent method and stochastic gradient descent method are often introduced on machine learning sites, but in the world of deep learning, more gradient descent methods are used. It may be said that it is a field where deep learning is flourishing and further developing.

The steepest descent method

Given the loss function $ f (x, y) $, if the gradient vector is partially differentiated with respect to $ x $ and $ y , then $ \ nabla f (x, y) = \ biggl (\ frac) {\ partial f} {\ partial x}, \ frac {\ partial f} {\ partial y} \ biggr) $$, so decide the initial position $ (x_0, y_0) $ appropriately and $ f [(x_0,, x_0,) y_0)-\ eta \ nabla f (x_0, y_0)] With $ as the next point, repeat until the result converges. $ \ Eta $ is called the learning rate.

However, the weakness of this method is that there is not always one loss function. The position of convergence changes when the initial value is taken (converges to the local solution).

Stochastic Gradient Descent (SGD)

The steepest descent method refers to one point, while the stochastic gradient descent method refers to multiple samples. Calculate $ w: = w- \ eta \ sum_ {i = 1} ^ n \ nabla Q_i (w) $ using an arbitrary number of samples extracted and shuffled.

In most cases, SGD seems to converge faster, but the steepest descent method is faster to calculate. In most cases, I think it's okay if you use SGD (Wikipedia talk % 9A% 84% E5% 8B% BE% E9% 85% 8D% E9% 99% 8D% E4% B8% 8B% E6% B3% 95)).

Summary

I developed simple regression and multiple regression and wrote about general regression and solution. By using the theory so far, I think that regression for various samples is required.

I actually wanted to try the python implementation, but I was exhausted. Next, after trying out some implementations in python, I'd like to summarize overfitting and regularization.

Recommended Posts

Machine learning algorithm (generalization of linear regression)
Machine learning linear regression
Machine learning algorithm (linear regression summary & regularization)
Machine Learning: Supervised --Linear Regression
Machine learning algorithm (logistic regression)
Machine learning beginners try linear regression
Machine learning algorithm (multiple regression analysis)
Machine learning algorithm (simple regression analysis)
Machine learning algorithm (implementation of multi-class classification)
<Course> Machine Learning Chapter 1: Linear Regression Model
Machine learning logistic regression
EV3 x Python Machine Learning Part 2 Linear Regression
Basics of Machine Learning (Notes)
Python Scikit-learn Linear Regression Analysis Nonlinear Simple Regression Analysis Machine Learning
Understand machine learning ~ ridge regression ~.
Coursera Machine Learning Challenges in Python: ex1 (Linear Regression)
Machine learning algorithm (simple perceptron)
Importance of machine learning datasets
Supervised machine learning (classification / regression)
Machine learning algorithm (support vector machine)
Machine learning stacking template (regression)
Try to evaluate the performance of machine learning / regression model
<Course> Machine Learning Chapter 6: Algorithm 2 (k-means)
Significance of machine learning and mini-batch learning
Python: Application of supervised learning (regression)
Machine learning algorithm (support vector machine application)
Machine learning ③ Summary of decision tree
Classification and regression in machine learning
Machine learning algorithm (gradient descent method)
Machine Learning: Supervised --Linear Discriminant Analysis
[Machine learning] Understanding linear simple regression from both scikit-learn and mathematics
Machine learning
Linear regression
[Machine learning] Understanding linear multiple regression from both scikit-learn and mathematics
<Course> Machine Learning Chapter 3: Logistic Regression Model
Machine learning with python (2) Simple regression analysis
2020 Recommended 20 selections of introductory machine learning books
Plate reproduction of Bayesian linear regression (PRML §3.3)
Machine learning algorithm classification and implementation summary
About the Normal Equation of Linear Regression
<Course> Machine Learning Chapter 2: Nonlinear Regression Model
Stock price forecast using machine learning (regression)
[Machine learning] List of frequently used packages
Judgment of igneous rock by machine learning ②
[Machine learning] Regression analysis using scikit learn
(Machine learning) I tried to understand Bayesian linear regression carefully with implementation.
Evaluation method of machine learning regression problem (mean square error and coefficient of determination)
Classification of guitar images by machine learning Part 1
Gaussian mixed model EM algorithm [statistical machine learning]
Beginning of machine learning (recommended teaching materials / information)
Machine learning of sports-Analysis of J-League as an example-②
Basics of Supervised Learning Part 1-Simple Regression- (Note)
Python & Machine Learning Study Memo ⑤: Classification of irises
Numerai Tournament-Fusion of Traditional Quants and Machine Learning-
Python & Machine Learning Study Memo ②: Introduction of Library
Full disclosure of methods used in machine learning
Dictionary learning algorithm
List of links that machine learning beginners are learning
Overview of machine learning techniques learned from scikit-learn
[Python3] Let's analyze data using machine learning! (Regression)
About the development contents of machine learning (Example)