The update stopped for a while and the blog was dying, but I thought again that I had to write out what I was doing, what I researched, and what I was thinking, so I will write it hard. Tsukkomi etc. are welcome, but please be kind to us.
The abbreviation is GLM (Generalized liner model), which is a model that extends linear regression so that it can handle distributions other than the normal distribution.
Lasso is a normalization technique for estimating generalized linear models. For a detailed explanation, the link below is accurate.
Generalized Linear Models Lasso and Elastic Net
Both Lasso and Ridge have the characteristic that a sparse solution can be obtained by having a "penalty" that constrains the size of the estimated coefficient.
I wrote a little about this area before, but since it is a reduction estimator in Lasso, it is assumed that there are variables that are not used. Lasso estimators have less error than normal maximum likelihood estimators and generally give better estimates and predictions than the least squares method.
Ridge regression is also a method of making a penalty and preventing learnability, and is effective when there is a strong correlation between several explanatory variables and you want to prevent multicollinearity.
In Lasso, the implementation by scikit-learn is as follows.
(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1
This is where you want to create a simple model by removing the complexity and putting in a few selected explanatory variables.
If the regularization term λ is 0, it matches the method of least squares. If λ is small, the penalty is loose and complicated, and if it is large, the penalty is severe and the model is simple.
As a method of determining the regularization term, cross-validation There is so-called cross-validation.
sklearn implements Lasso and Ridge under sklearn.linear_model.
So
import sklearn.linear_model as lm
After all, you can call it like lm.Lasso.
Lasso http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html#sklearn.linear_model.Lasso
There are some parameters when creating a Lasso instance.
It's really the same, so please see the official page from the link above. I just traced the following
from sklearn import linear_model
clf = linear_model.Lasso(alpha=0.1)
clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
#=> Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,
# normalize=False, positive=False, precompute=False, random_state=None,
# selection='cyclic', tol=0.0001, warm_start=False)
print(clf.coef_)
#=> [ 0.85 0. ]
print(clf.intercept_)
#=> 0.15
Fitting (learning) with the fit method and prediction with the predict method are as usual.
Ridge http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge
from sklearn import linear_model
clf = linear_model.Ridge (alpha = .5)
clf.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
#=> Ridge(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=None,
#=> normalize=False, solver='auto', tol=0.001)
clf.coef_
#=> array([ 0.34545455, 0.34545455])
clf.intercept_
#=> 0.13636...
from sklearn.linear_model import Ridge
import numpy as np
n_samples, n_features = 10, 5
np.random.seed(0)
y = np.random.randn(n_samples)
X = np.random.randn(n_samples, n_features)
clf = Ridge(alpha=1.0)
clf.fit(X, y)
#=> Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
# normalize=False, solver='auto', tol=0.001)
And for scikit-learn, Lasso cross test LassoCV and Lasso regression LassoRegression, also [RidgeCV](http://scikit-learn.org/ stable / modules / generated / sklearn.linear_model.RidgeCV.html # sklearn.linear_model.RidgeCV) and [RidgeRegression](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeClassifier.html#sklearn .linear_model.RidgeClassifier) There are various other implementations of Linear model, so take a look around here. However, we will tune the parameters to find the appropriate model.
This time is over.
Recommended Posts