2. Multivariate analysis spelled out in Python 6-3. Ridge regression / Lasso regression (scikit-learn) [How regularization works]

I would like to take a closer look at the differences in the effectiveness of regularization between ridge regression and lasso regression.
Generate 50 regularization parameters $ λ $ and repeat the estimation 50 times, swapping $ λ $ for each regression model.
In the process, observe how the estimated coefficients for each variable change.

⑴ Import library

#Data processing / calculation / analysis library
import numpy as np
import pandas as pd

#Graph drawing library
import matplotlib.pyplot as plt
%matplotlib inline

#Machine learning library
import sklearn
from sklearn.linear_model import Ridge, Lasso #Class for regression model generation

#Module to make matplotlib support Japanese display
!pip install japanize-matplotlib
import japanize_matplotlib

⑵ Data acquisition and reading

#Get data
url = 'https://raw.githubusercontent.com/yumi-ito/sample_data/master/ridge_lasso_50variables.csv'

#Read the acquired data as a DataFrame object
df = pd.read_csv(url)

print(df)

In this dummy data, the 0th column is the objective variable y, and the 1st and subsequent explanatory variables in the column are 50 in total. The number of specimens is 150.
Since it has already been standardized, each variable has an average of 0 and a standard deviation of 1.
Also, the objective variable y, which was intentionally created. It is calculated by setting the correct coefficient of the first variable x_1 to "5" and adding noise that follows a normal distribution. Whether or not the correct answer "5" can be estimated is also an issue.

#Create explanatory variable x by deleting the "y" column
x = df.drop('y', axis=1)

#Extract the "y" column to create the objective variable y
y = df['y']

(3) Generation of regularization parameter λ

# λ(alpha)Generate 50 ways
num_alphas = 50
alphas = np.logspace(-2, 0.7, num_alphas)

print(alphas)

numpy.logspace () is a function with a twist that ** takes an arithmetic progression with a base of 10 as a logarithm **.
Specify (start value, end value, number to be generated) in the argument, but when you actually take the logarithm, it will be as follows.

np.log10(alphas)

Start value -2, end value 0.7, all 50 arithmetic progressions.
If it is an arithmetic progression, numpy.arange () seems to be good, but I do this because it is necessary to use ** logarithmic scale ** when visualizing later.

Logarithmic scale

The value of the logarithmic scale doubles with each scale. A graph that uses either the x-axis, the y-axis, or both axes is called a logarithmic graph.
Ordinary scales are called linear scales, but if you convert them to logarithmic scales ...
Since it feels like the axis scale is compressed tightly on the number line, it makes it easier to visually compare data that is far apart in the number of digits.

⑷ Estimate by ridge regression

#Variable to store regression coefficients
ridge_coefs = []

#Repeat the estimation of ridge regression while exchanging alpha
for a in alphas:
    ridge = Ridge(alpha = a, fit_intercept = False)
    ridge.fit(x, y)
    ridge_coefs.append(ridge.coef_)

Repeat the estimation of ridge regression while exchanging ʻalpha, and add the regression coefficient to ridge_coefs`.
The argument fit_intercept = False ofRidge (), which generates the model template, specifies whether to calculate the intercept. If it is set to False, the intercept is not calculated, that is, the intercept always passes through the origin. Will be.

#Convert the accumulated regression coefficients to a numpy array
ridge_coefs = np.array(ridge_coefs)

print("Array shape:", ridge_coefs.shape)
print(ridge_coefs)

In a 50 x 50 array, 50 pairs of coefficients are obtained for each parameter.
Visualize this. Place the parameters on the x-axis, but use log_alphas, which is the logarithmic conversion of ʻalphas` and minus.
The plt.text () function, which displays text in the graph, uses (x, y," str ") as arguments to specify coordinates and strings.

#Logarithmic conversion of alphas(-log10)
log_alphas = -np.log10(alphas)

#Specifying the size of the graph area
plt.figure(figsize = (8,6))

#Line graph with λ on the x-axis and coefficients on the y-axis
plt.plot(log_alphas, ridge_coefs)

#Explanatory variable x_Show 1
plt.text(max(log_alphas) + 0.1, np.array(ridge_coefs)[0,0], "x_1", fontsize=13)

#Specify x-axis range
plt.xlim([min(log_alphas) - 0.1, max(log_alphas) + 0.3])

#Axis label
plt.xlabel("Regularization parameter λ(-log10)", fontsize=13)
plt.ylabel("Regression coefficient", fontsize=13)

#Scale line
plt.grid()

Since the x-axis is $ -log_ {10} $, the value of the regularization parameter $ λ $ increases as you move to the left, and the penalty becomes stronger.
Since the objective variable y is generated based on the explanatory variable x_1, only the variable x_1 shows linearity by itself.
The absolute value of the coefficient becomes smaller toward the left side, and the penalty is loosened toward the right side, so it can be seen that the coefficient with a large absolute value tends to be estimated easily.

⑸ Estimate by lasso regression

Repeat the lasso regression estimation 50 times, using the same regularization parameters (alphas) as the ridge regression.

#Variable to store regression coefficients
lasso_coefs = []

#Repeat the estimation of the lasso regression while exchanging alpha
for a in alphas:
    lasso = Lasso(alpha = a, fit_intercept = False)
    lasso.fit(x, y)
    lasso_coefs.append(lasso.coef_)

#Convert the accumulated regression coefficients to a numpy array
lasso_coefs = np.array(lasso_coefs)

print("Array shape:", lasso_coefs.shape)
print(lasso_coefs)

Similarly, draw a semi-logarithmic graph with the x-axis of $ -log_ {10} $.

#Specifying the size of the graph area
plt.figure(figsize = (8,6))

#Line graph with λ on the x-axis and coefficients on the y-axis
plt.plot(log_alphas, lasso_coefs)

#Explanatory variable x_Show 1
plt.text(max(log_alphas) + 0.1, np.array(lasso_coefs)[0,0], "x_1", fontsize=13)

#Specify x-axis range
plt.xlim([min(log_alphas) - 0.1, max(log_alphas) + 0.3])

#Axis label
plt.xlabel("Regularization parameter λ(-log10)", fontsize=13)
plt.ylabel("Regression coefficient", fontsize=13)

#Scale line
plt.grid()

Except for the variable x_1, the coefficient is almost 0.
For other variables, non-zero coefficients are slightly seen near the weakest parameter on the far right.

Summary

** Ridge regression ** tends to `** overall small absolute coefficient estimates ** for a number of variables. ** Lasso regression ** tends to ** partially have non-zero coefficients for a small number of variables and all other coefficients to be 0 **.
In lasso regression, most of the coefficients are 0, so it can be said that it is equivalent to finding the regression coefficient and reducing the dimension at the same time.