Easy Lasso regression analysis with Python (no theory)

For the time being, it is for those who want to perform LASSO regression analysis with python. Do not mess with the parameters. Click here for data to use https://gist.github.com/tijptjik/9408623

At the bottom, there is a code that summarizes everything.

Import module for LASSO regression

Import only Lasso from sklearn.linear_model.

from sklearn.linear_model import Lasso

Import module to split data

Import only train_test_split from sklearn.model_selection.

from sklearn.model_selection import train_test_split

Import modules that handle matrices

Import numpy with the name np available.

import numpy as np

Import module that handles csv

Import pandas with the name pd available.

import pandas as pd

Import the module to draw the graph

import matplotlib.pyplot as plt

Import the module to find the mean square error

from sklearn.metrics import mean_squared_error

load csv

Load iris.csv into df (data frame).

df=pd.read_csv('wine_type.csv')

('iris.csv') is the access to the csv file from the current directory. If you run python on your desktop and you have csv in Desktop> Documents,

df=pd.read_csv('Desktop/Documents/wine.csv')

And so on. (Linux)

Divide the data for training and testing

Training: Learning = 6: 4.

df_train, df_test = train_test_split(df, test_size=0.4)

When the data is displayed, it looks like this.

df_train=
     wine_type  alcohol  malic_acid   ash  alcalinity_of_ash  magnesium  total_phenols  flavanoids  nonflavanoid_phenols  proanthocyanins  color_intensity   hue  OD280/OD315_of_diluted_wines  proline
106          2    12.25        1.73  2.12               19.0         80           1.65        2.03                  0.37             1.63             3.40  1.00                          3.17      510
157          3    12.45        3.03  2.64               27.0         97           1.90        0.58                  0.63             1.14             7.50  0.67                          1.73      880
75           2    11.66        1.88  1.92               16.0         97           1.61        1.57                  0.34             1.15             3.80  1.23                          2.14      428
142          3    13.52        3.17  2.72               23.5         97           1.55        0.52                  0.50             0.55             4.35  0.89                          2.06      520
83           2    13.05        3.86  2.32               22.5         85           1.65        1.59                  0.61             1.62             4.80  0.84                          2.01      515
..         ...      ...         ...   ...                ...        ...            ...         ...                   ...              ...              ...   ...                           ...      ...
117          2    12.42        1.61  2.19               22.5        108           2.00        2.09                  0.34             1.61             2.06  1.06                          2.96      345
129          2    12.04        4.30  2.38               22.0         80           2.10        1.75                  0.42             1.35             2.60  0.79                          2.57      580
60           2    12.33        1.10  2.28               16.0        101           2.05        1.09                  0.63             0.41             3.27  1.25                          1.67      680
25           1    13.05        2.05  3.22               25.0        124           2.63        2.68                  0.47             1.92             3.58  1.13                          3.20      830
41           1    13.41        3.84  2.12               18.8         90           2.45        2.68                  0.27             1.48             4.28  0.91                          3.00     1035

[106 rows x 14 columns]

Separate the explanatory variable from the objective variable

Insert the column you want to use for analysis in x. (Explanatory variable) Insert a column of analysis results in y. (Objective variable) This time, we predict'proline' from'color_intensity'.

x_train = df_train[['color_intensity']]
x_test  = df_test[['color_intensity']]

y_train = df_train['proline']
y_test  = df_test['proline']

Make an empty model

lss = Lasso()

Learn regression

fit (explanatory variable, objective variable)

The training result is stored in the model lss created above.

lss.fit(x_train, y_train)

Make a regression

predict (data for regression analysis)

Regress with and assign to y_pred.

y_pred = lss.predict(x_test)

Try to display on the graph

You can make a scatter plot with plt.scatter (x-axis, y-axis). Display the correct answer. (Blue dot)

plt.scatter(x_test, y_test)

From the minimum value of x_test ["color_intensity"] to the maximum value, create an array in 0.1 increments and make a matrix. Then run lss.predict to display the predicted value. (Red dot)

x_for_plot = np.arange(np.min(x_test["color_intensity"])
                      ,np.max(x_test["color_intensity"]),0.1).reshape(-1,1)
plt.scatter(x_for_plot,lss.predict(x_for_plot),color="red")

Label settings

plt.xlabel("color_intensity")
plt.ylabel("proline")

display

plt.show()

Blue is the actual value and red is the predicted value. スクリーンショット 2019-12-01 22.08.18.png

Finally, find the mean square error.

print(mean_squared_error(y_test,y_pred))　#90027.41397601982 That's lol

I think that the accuracy will be improved by playing with the parameters.

Below is the code for copying

from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
df=pd.read_csv('wine.csv')
df_train, df_test = train_test_split(df, test_size=0.4)
x_train = df_train[['color_intensity']]
x_test  = df_test[['color_intensity']]

y_train = df_train['proline']
y_test  = df_test ['proline']
print(y_train)
lss = Lasso()
lss.fit(x_train, y_train)
y_pred = lss.predict(x_test)

plt.scatter(x_test, y_test)
x_for_plot = np.arange(np.min(x_test["color_intensity"]),np.max(x_test["color_intensity"]),0.1).reshape(-1,1)
plt.scatter(x_for_plot,lss.predict(x_for_plot),color="red")
plt.xlabel("color_intensity")
plt.ylabel("proline")
plt.show()

print(mean_squared_error(y_test,y_pred))