I started studying deep learning. This time, I will briefly summarize regularization.
Based on the equation $ y = -x ^ 3 + x ^ 2 + x $, x is the value obtained by dividing -10 to 10 by 50, and y is the result of substituting that x into the equation and adding a random number from 0 to 0.05. Create the data as a value.
from sklearn.preprocessing import PolynomialFeatures
from sklearn import linear_model
import numpy as np
import matplotlib.pyplot as plt
#Data generation
np.random.seed(0)
X = np.linspace(-10, 10, 50)
Y_truth = 0.001 * (-X **3 + X**2 + X)
Y = Y_truth + np.random.normal(0, 0.05, len(X))
plt.figure(figsize=(5, 5))
plt.plot(X, Y_truth, color='gray')
plt.plot(X, Y, '.', color='k')
plt.show()
This is the created data. It is assumed that the solid line is the true value (the value of the equation) and the point is the value actually observed (the value of y plus noise).
Overfitting is more likely to occur with higher degrees of freedom, so we dare to introduce 30-dimensional polynomial regression.
#graph display
def graph(Y_lr, name):
plt.figure(figsize=(6, 6))
plt.plot(X, Y_truth, color='gray', label='truth')
plt.plot(xs, Y_lr, color='r', markersize=2, label=name)
plt.plot(X, Y, '.', color='k')
plt.legend()
plt.ylim(-1, 1)
plt.show()
#Display settings
xs = np.linspace(-10, 10, 200)
#Introduction of polynomial regression
poly = PolynomialFeatures(degree=30, include_bias=False)
X_poly = poly.fit_transform(X[:, np.newaxis])
After setting the graph display and display, PolynomialFeatures is instantiated and fitted. The dimension is 30 dimensions (degree = 30).
First, do polynomial regression without regularization.
#No regularization
lr0 = linear_model.LinearRegression(normalize=True)
lr0.fit(X_poly, Y)
Y_lr0 = lr0.predict(poly.fit_transform(xs[:, np.newaxis]))
graph(Y_lr0, 'No Regularization')
Due to its high degree of freedom with a 30-dimensional polynomial, it is possible to pass many points dexterously, resulting in typical overfitting. It is far from the true value, and generalization performance cannot be expected with this.
L2 regularization is a technique known for Ridge regression, which limits the coefficients so that they do not become too large, and adds the L2 norm of the parameter to the loss (C is a constant).
#L2 regularization
lr2 = linear_model.Ridge(normalize=True, alpha=0.5)
lr2.fit(X_poly, Y)
Y_lr2 = lr2.predict(poly.fit_transform(xs[:, np.newaxis]))
graph(Y_lr2, 'L2')
Well, I feel like I've been able to return successfully.
L1 regularization is a technique known for Lasso regression, which also limits the coefficients so that they do not become too large, adding the L1 norm of the parameter to the loss (C is a constant).
#L1 regularization
lr1 = linear_model.LassoLars(normalize=True, alpha=0.001)
lr1.fit(X_poly, Y)
Y_lr1 = lr1.predict(poly.fit_transform(xs[:, np.newaxis]))
graph(Y_lr1, 'L1')
The shape is very close to a perfect fit. Compared to L1 regularization, L2 regularization seems to be able to regress better.
Compares 30 dimensional coefficients for each of no regularization, L2 regularization, and L1 regularization (listed from lowest dimension).
import pandas as pd
result = []
for i in range(len(lr0.coef_)):
tmp = lr0.coef_[i], lr2.coef_[i], lr1.coef_[i]
result.append(tmp)
df = pd.DataFrame(result)
df.columns = ['No Regularization', 'L2', 'L1']
print(df)
You can see that L2 has a smaller coefficient than No Regularization. L1 is also a sparse expression with many completely zeros.
I'm glad that L1 regularization can suppress overfitting and reduce dimensions.
Recommended Posts