Last time University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (4) https://github.com/legacyworld/sklearn-basic

4.1 Comparison of ridge regression and Lasso

This is a comparison between the ridge regression and the lasso regression. Youtube commentary is 5th (1) per 12 minutes 50 seconds It's not much different as a program, but the results don't match the answers. I tried various things and saw it, but gave up. This time, I will return to the wine data of the first task.

`python:Homework_4.1.py`


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn.metrics import mean_squared_error
from sklearn import preprocessing
from sklearn.model_selection import cross_val_score

#scikit-Import wine data from lean
df= pd.read_csv('winequality-red.csv',sep=';')
#Since the target value quality is included, create a dropped dataframe
df1 = df.drop(columns='quality')
y = df['quality'].values.reshape(-1,1)
scaler = preprocessing.StandardScaler()
#Regularization parameters
alpha = 2 ** (-16)
X = df1.values
X_fit = scaler.fit_transform(X)
#DataFrame for storing results
df_ridge_coeff = pd.DataFrame(columns=df1.columns)
df_ridge_result = pd.DataFrame(columns=['alpha','TrainErr','TestErr'])
df_lasso_coeff = pd.DataFrame(columns=df1.columns)
df_lasso_result = pd.DataFrame(columns=['alpha','TrainErr','TestErr'])
while alpha <= 2 ** 12:
    #Ridge regression
    model_ridge = linear_model.Ridge(alpha=alpha)
    model_ridge.fit(X_fit,y)
    mse_ridge = mean_squared_error(model_ridge.predict(X_fit),y)
    scores_ridge = cross_val_score(model_ridge,X_fit,y,scoring="neg_mean_squared_error",cv=10)
    df_ridge_coeff = df_ridge_coeff.append(pd.Series(model_ridge.coef_[0],index=df_ridge_coeff.columns),ignore_index=True)
    df_ridge_result = df_ridge_result.append(pd.Series([alpha,mse_ridge,-scores_ridge.mean()],index=df_ridge_result.columns),ignore_index=True)    
    #Lasso return
    model_lasso = linear_model.Lasso(alpha=alpha)
    model_lasso.fit(X_fit,y)
    mse_lasso = mean_squared_error(model_lasso.predict(X_fit),y)
    scores_lasso = cross_val_score(model_lasso,X_fit,y,scoring="neg_mean_squared_error",cv=10)
    df_lasso_coeff = df_lasso_coeff.append(pd.Series(model_lasso.coef_,index=df_lasso_coeff.columns),ignore_index=True)
    df_lasso_result = df_lasso_result.append(pd.Series([alpha,mse_lasso,-scores_lasso.mean()],index=df_lasso_result.columns),ignore_index=True)    
    alpha = alpha * 2

for index, row in df_ridge_coeff.iterrows():
    print(row.sort_values())
    print(df_ridge_result.iloc[index])
print(df_ridge_result.sort_values('TestErr'))

for index, row in df_lasso_coeff.iterrows():
    print(row.sort_values())
    print(df_lasso_result.iloc[index])
print(df_lasso_result.sort_values('TestErr'))

Similar to the explanation, the obtained coefficients are arranged in ascending order, and the training error and test error are also output. The original answer is as follows.

--Regularization parameters that give the minimum training and test errors in ridge regression --Training error: $ 2 ^ {-16} $ --Test error: 0.0625 (TestErr = 0.43394) --Regularization parameters that give the minimum training and test errors in the lasso regression --Training error: $ 2 ^ {-16} $ --Test error: 0.000244 (TestErr = 0.43404)

The training error is, of course, the one with the smallest regularization parameter, but the test error is quite different. The result of this program is as follows.

Ridge regression
          alpha  TrainErr   TestErr
23   128.000000  0.417864  0.433617
22    64.000000  0.417102  0.433799
21    32.000000  0.416863  0.434265
20    16.000000  0.416793  0.434649
24   256.000000  0.420109  0.434870
19     8.000000  0.416774  0.434894
18     4.000000  0.416769  0.435033
17     2.000000  0.416768  0.435107
16     1.000000  0.416767  0.435146
15     0.500000  0.416767  0.435165
14     0.250000  0.416767  0.435175
13     0.125000  0.416767  0.435180
12     0.062500  0.416767  0.435182
11     0.031250  0.416767  0.435184
10     0.015625  0.416767  0.435184
9      0.007812  0.416767  0.435185
8      0.003906  0.416767  0.435185
7      0.001953  0.416767  0.435185
6      0.000977  0.416767  0.435185
5      0.000488  0.416767  0.435185
4      0.000244  0.416767  0.435185
3      0.000122  0.416767  0.435185
2      0.000061  0.416767  0.435185
1      0.000031  0.416767  0.435185
0      0.000015  0.416767  0.435185
25   512.000000  0.426075  0.440302
26  1024.000000  0.439988  0.454846
27  2048.000000  0.467023  0.483752
28  4096.000000  0.507750  0.526141

Lasso return
          alpha  TrainErr   TestErr
9      0.007812  0.418124  0.434068
10     0.015625  0.420260  0.434252
8      0.003906  0.417211  0.434764
7      0.001953  0.416878  0.435060
6      0.000977  0.416795  0.435161
0      0.000015  0.416767  0.435185
1      0.000031  0.416767  0.435185
2      0.000061  0.416767  0.435186
5      0.000488  0.416774  0.435186
3      0.000122  0.416768  0.435186
4      0.000244  0.416769  0.435189
11     0.031250  0.424774  0.438609
12     0.062500  0.439039  0.451202
13     0.125000  0.467179  0.478006
14     0.250000  0.549119  0.562292
15     0.500000  0.651761  0.663803
16     1.000000  0.651761  0.663803
17     2.000000  0.651761  0.663803
18     4.000000  0.651761  0.663803
19     8.000000  0.651761  0.663803
20    16.000000  0.651761  0.663803
21    32.000000  0.651761  0.663803
22    64.000000  0.651761  0.663803
23   128.000000  0.651761  0.663803
24   256.000000  0.651761  0.663803
25   512.000000  0.651761  0.663803
26  1024.000000  0.651761  0.663803
27  2048.000000  0.651761  0.663803
28  4096.000000  0.651761  0.663803

--Ridge regression: 128 (TestErr = 0.43362) --Lasso Return: 0.007812 (TestErr = 0.43407)

Unfortunately, even if I changed the cross-validation method to stratified kfold and changed the number of divisions of K-Fold, it was useless. However, the tendency is correct, so I don't think it's a big mistake as a program.

If you try this task, you can see the difference between Ridge and Lasso. In ridge regression, all coefficients gradually become less influential, while in Lasso, those with less influence quickly become zero.

The progress of the lasso return
volatile acidity       -0.183183
total sulfur dioxide   -0.090231
chlorides              -0.081657
pH                     -0.060154
fixed acidity           0.000000
citric acid            -0.000000
density                -0.000000
residual sugar          0.002591
free sulfur dioxide     0.027684
sulphates               0.139798
alcohol                 0.304033

alpha       0.007812
TrainErr    0.418124
TestErr     0.434068

The volatile acidity and alcohol, which have a great influence on the quality of wine, remain, but the others are all smaller. This is the result of ridge regression with the same regularization parameter (= 0.007812).

volatile acidity       -0.193965
total sulfur dioxide   -0.107355
chlorides              -0.088183
pH                     -0.063840
citric acid            -0.035550
density                -0.033741
residual sugar          0.023020
fixed acidity           0.043500
free sulfur dioxide     0.045605
sulphates               0.155276
alcohol                 0.294240

alpha       0.007812
TrainErr    0.416767
TestErr     0.435185

Since citrix acid and density have almost no effect on quality, the result of lasso regression that it can be ignored and simplified is interesting.

Past posts

University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (1) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (2) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (3)

University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (5)

4.1 Comparison of ridge regression and Lasso

python:Homework_4.1.py

Past posts

`python:Homework_4.1.py`