For the time being, it is for those who want to perform LASSO regression analysis with python. Do not mess with the parameters. Click here for data to use https://gist.github.com/tijptjik/9408623
Import only Lasso from sklearn.linear_model.
from sklearn.linear_model import Lasso
Import only train_test_split from sklearn.model_selection.
from sklearn.model_selection import train_test_split
Import numpy with the name np available.
import numpy as np
Import pandas with the name pd available.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
Load iris.csv into df (data frame).
df=pd.read_csv('wine_type.csv')
df=pd.read_csv('Desktop/Documents/wine.csv')
And so on. (Linux)
Training: Learning = 6: 4.
df_train, df_test = train_test_split(df, test_size=0.4)
When the data is displayed, it looks like this.
df_train=
wine_type alcohol malic_acid ash alcalinity_of_ash magnesium total_phenols flavanoids nonflavanoid_phenols proanthocyanins color_intensity hue OD280/OD315_of_diluted_wines proline
106 2 12.25 1.73 2.12 19.0 80 1.65 2.03 0.37 1.63 3.40 1.00 3.17 510
157 3 12.45 3.03 2.64 27.0 97 1.90 0.58 0.63 1.14 7.50 0.67 1.73 880
75 2 11.66 1.88 1.92 16.0 97 1.61 1.57 0.34 1.15 3.80 1.23 2.14 428
142 3 13.52 3.17 2.72 23.5 97 1.55 0.52 0.50 0.55 4.35 0.89 2.06 520
83 2 13.05 3.86 2.32 22.5 85 1.65 1.59 0.61 1.62 4.80 0.84 2.01 515
.. ... ... ... ... ... ... ... ... ... ... ... ... ... ...
117 2 12.42 1.61 2.19 22.5 108 2.00 2.09 0.34 1.61 2.06 1.06 2.96 345
129 2 12.04 4.30 2.38 22.0 80 2.10 1.75 0.42 1.35 2.60 0.79 2.57 580
60 2 12.33 1.10 2.28 16.0 101 2.05 1.09 0.63 0.41 3.27 1.25 1.67 680
25 1 13.05 2.05 3.22 25.0 124 2.63 2.68 0.47 1.92 3.58 1.13 3.20 830
41 1 13.41 3.84 2.12 18.8 90 2.45 2.68 0.27 1.48 4.28 0.91 3.00 1035
[106 rows x 14 columns]
Insert the column you want to use for analysis in x. (Explanatory variable) Insert a column of analysis results in y. (Objective variable) This time, we predict'proline' from'color_intensity'.
x_train = df_train[['color_intensity']]
x_test = df_test[['color_intensity']]
y_train = df_train['proline']
y_test = df_test['proline']
lss = Lasso()
The training result is stored in the model lss created above.
lss.fit(x_train, y_train)
Regress with and assign to y_pred.
y_pred = lss.predict(x_test)
You can make a scatter plot with plt.scatter (x-axis, y-axis). Display the correct answer. (Blue dot)
plt.scatter(x_test, y_test)
From the minimum value of x_test ["color_intensity"] to the maximum value, create an array in 0.1 increments and make a matrix. Then run lss.predict to display the predicted value. (Red dot)
x_for_plot = np.arange(np.min(x_test["color_intensity"])
,np.max(x_test["color_intensity"]),0.1).reshape(-1,1)
plt.scatter(x_for_plot,lss.predict(x_for_plot),color="red")
Label settings
plt.xlabel("color_intensity")
plt.ylabel("proline")
display
plt.show()
Blue is the actual value and red is the predicted value.
print(mean_squared_error(y_test,y_pred)) #90027.41397601982 That's lol
I think that the accuracy will be improved by playing with the parameters.
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
df=pd.read_csv('wine.csv')
df_train, df_test = train_test_split(df, test_size=0.4)
x_train = df_train[['color_intensity']]
x_test = df_test[['color_intensity']]
y_train = df_train['proline']
y_test = df_test ['proline']
print(y_train)
lss = Lasso()
lss.fit(x_train, y_train)
y_pred = lss.predict(x_test)
plt.scatter(x_test, y_test)
x_for_plot = np.arange(np.min(x_test["color_intensity"]),np.max(x_test["color_intensity"]),0.1).reshape(-1,1)
plt.scatter(x_for_plot,lss.predict(x_for_plot),color="red")
plt.xlabel("color_intensity")
plt.ylabel("proline")
plt.show()
print(mean_squared_error(y_test,y_pred))
Recommended Posts