Step-by-step on the theory, implementation in python, and analysis using scikit-learn about the algorithm previously taken up in "Classification of Machine Learning" I will study with. I'm writing it for personal learning, so I'd like you to overlook any mistakes.
So far, we have seen "simple regression" and "multiple regression", but both have talked in the same field of linear regression. This time, I would like to summarize the "** linear basis regression model " that generalizes linear regression and the " gradient descent method **" for optimizing the loss function. The following sites were referred to this time.
To draw an approximate curve for the data sequence, the simple regression model is
y=w_0x_0+w_1x_1+\cdots+w_nx_n
It was to be approximated by. Furthermore, it can be seen that for simple regression, only two items of the multiple regression equation were used.
Now, if the weight of each term is $ (w_0, w_1, \ cdots, w_n) $, the function of the model can actually be anything, and if this is $ y = \ phi (x) $,
y(\boldsymbol{x}, \boldsymbol{w}) = \sum_{j=0}^{M-1}w_j\phi_{j}(\boldsymbol{x})
It is expressed as. $ \ boldsymbol {w} = (w_0, w_1, \ cdots, w_ {M-1}) ^ T $, $ \ boldsymbol {\ phi} = (\ phi_0, \ phi_1, \ cdots, \ phi_ {M-1 }) ^ T $. If $ \ phi_0 = 1 $ (intercept term),
y(\boldsymbol{x}, \boldsymbol{w}) = \boldsymbol{w}^T\phi(x)
become. This $ \ phi (x) $ is called ** basis set **.
The generalized expression means that linear regression means finding a sequence of coefficients $ \ boldsymbol {w} $ that best represents a given sequence of data by combining some basis functions. ..
scikit-learn allows you to use various basis functions for regression.
For simple regression and multiple regression, we found a coefficient that minimizes the sum of squared residuals. Although it was possible to find w mathematically with simple regression, it is often very difficult to find a solution analytically when the basis function is complicated or the data has many dimensions. In such cases, it is necessary to find the coefficient approximately. At that time, "** gradient descent method **" is used. Literally, it is a method to find the optimum value while going down the slope (gradient).
Think about how to find the coefficients, including how to solve them mathematically. It is described in detail below.
It is a method to find a solution by formula transformation as described in simple regression and multiple regression. It is a method of solving simultaneous equations from square completion and partial differentiation. If the formula is simple, there is no problem, but if the model is complicated, there are cases where it cannot be solved.
The gradient method is literally a way to go down the gradient of the loss function. The value of the loss function needs to be small in order to find the optimum parameter, but it is an image of going down a slope toward a smaller value.
The steepest descent method and stochastic gradient descent method are often introduced on machine learning sites, but in the world of deep learning, more gradient descent methods are used. It may be said that it is a field where deep learning is flourishing and further developing.
Given the loss function $ f (x, y) $, if the gradient vector is partially differentiated with respect to $ x $ and $ y
However, the weakness of this method is that there is not always one loss function. The position of convergence changes when the initial value is taken (converges to the local solution).
The steepest descent method refers to one point, while the stochastic gradient descent method refers to multiple samples. Calculate
In most cases, SGD seems to converge faster, but the steepest descent method is faster to calculate. In most cases, I think it's okay if you use SGD (Wikipedia talk % 9A% 84% E5% 8B% BE% E9% 85% 8D% E9% 99% 8D% E4% B8% 8B% E6% B3% 95)).
I developed simple regression and multiple regression and wrote about general regression and solution. By using the theory so far, I think that regression for various samples is required.
I actually wanted to try the python implementation, but I was exhausted. Next, after trying out some implementations in python, I'd like to summarize overfitting and regularization.