Last time University of Tsukuba Machine Learning Course: Study sklearn while making the Python script part of the task (7) Make your own steepest descent method https://github.com/legacyworld/sklearn-basic
Explanation is the 5th (1) per 24 minutes 30 seconds Last time, only the re-descent method was possible, so this time I will implement the probabilistic re-descent method. The program itself doesn't change that much. Mathematically it looks like this:
\lambda =Regularization parameters,
\beta = \begin{pmatrix} \beta_0 \\ \beta_1\\ \vdots \\ \beta_m \end{pmatrix},
y = \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_N \end{pmatrix},
X = \begin{pmatrix}
1&x_{11}&x_{12}&\cdots&x_{1m}\\
1&x_{21}&x_{22}&\cdots&x_{2m}\\
\vdots\\
1&x_{N1}&x_{N2}&\cdots&x_{Nm}
\end{pmatrix}\\ \\
\beta^{t+1} = \beta^{t}(1-2\lambda\eta) - \eta\frac{1}{N}x_i^T(x_i\beta^t-y_i)
Until now, all the data was used for the calculation of the gradient, but the calculation is performed only with $ x_i, y_i $, which are randomly selected.
What caught me a little was the Numpy Transpose specification.
In the case of a one-dimensional vector, if you use transpose (.T), it will be returned as it is, so you need to use .reshape (-1,1)
.
See here.
https://note.nkmk.me/python-numpy-transpose/
a_1d = np.arange(3)
print(a_1d)
# [0 1 2]
print(a_1d.T)
# [0 1 2]
a_col = a_1d.reshape(-1, 1)
print(a_col)
# [[0]
# [1]
# [2]]
Click here for the source code.
python:Homework_4.3SGD.py
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.base import BaseEstimator
from sklearn.model_selection import cross_validate
import statsmodels.api as sm
class MyEstimator(BaseEstimator):
def __init__(self,ep,eta,l):
self.ep = ep
self.eta = eta
self.l = l
self.loss = []
# fit()Implemented
def fit(self, X, y):
self.coef_ = self.stochastic_grad_desc(X,y)
#fit returns self
return self
# predict()Implemented
def predict(self, X):
return np.dot(X, self.coef_)
def shuffle(self,X,y):
r = np.random.permutation(len(y))
return X[r],y[r]
def stochastic_grad_desc(self,X,y):
m = len(y)
loss = []
#Types of features
dim = X.shape[1]
#Initial value of beta
beta = np.ones(dim).reshape(-1,1)
eta = self.eta
l = self.l
X_shuffle, y_shuffle = self.shuffle(X,y)
#If it does not improve T times, it ends
T = 100
#Number of times not improved
not_improve = 0
#Objective function minimum value initial value
min = 10 ** 9
while True:
for Xi,yi in zip(X_shuffle,y_shuffle):
loss.append((1/(2*m))*np.sum(np.square(np.dot(X,beta)-y)))
beta = beta*(1-2*l*eta) - eta*Xi.reshape(-1,1)*(np.dot(Xi,beta)-yi)
if loss[len(loss)-1] < min:
min = loss[len(loss)-1]
min_beta = beta
not_improve = 0
else:
#If the minimum value of the objective function is not updated
not_improve += 1
if not_improve >= T:
break
#If all samples are finished but the minimum value changes within T times, loop again
if not_improve >= T:
self.loss = loss
break
return min_beta
#scikit-Import wine data from lean
df= pd.read_csv('winequality-red.csv',sep=';')
#Since the target value quality is included, create a dropped dataframe
df1 = df.drop(columns='quality')
y = df['quality'].values.reshape(-1,1)
X = df1.values
scaler = preprocessing.StandardScaler()
X_fit = scaler.fit_transform(X)
X_fit = sm.add_constant(X_fit) #Add 1 to the first column
epsilon = 10 ** (-7)
eta_list = [0.03,0.01,0.003]
loss = []
coef = []
for eta in eta_list:
l = 10**(-5)
test_min = 10**(9)
while l <= 1/(2*eta):
myest = MyEstimator(epsilon,eta,l)
myest.fit(X_fit,y)
scores = cross_validate(myest,X_fit,y,scoring="neg_mean_squared_error",cv=10)
if abs(scores['test_score'].mean()) < test_min:
test_min = abs(scores['test_score'].mean())
loss = myest.loss
l_min = l
coef = myest.coef_
l = l * 10**(0.5)
plt.plot(loss,label=f"$\eta$={eta}")
print(f"eta = {eta} : iter = {len(loss)}, loss = {loss[-1]}, lambda = {l_min}, TestErr = {test_min}")
#Coefficient output: Since the intercept is included at the very beginning, take it out from the second and output the intercept at the end.
i = 1
for column in df1.columns:
print(column,coef[i][0])
i+=1
print('intercept',coef[0][0])
plt.legend()
plt.savefig("sgd.png ")
In the commentary, the condition "Stop if there is no improvement 100 times in a row" was written, so it is adjusted accordingly. In the pattern where $ \ eta $ is small, even if all 1599 are used, this condition is not met, so it may enter the second lap. The result looks like this. At $ \ eta = 0.03 $, you can see that the error increases a little toward the end. The coefficient obtained at the end, etc.
eta = 0.03 : iter = 298, loss = 0.29072324272824085, lambda = 0.0031622776601683803, TestErr = 0.47051639691326796
fixed acidity 0.1904239451124434
volatile acidity -0.11242984344193296
citric acid -0.00703125780915424
residual sugar 0.2092352618792849
chlorides -0.044795495356479025
free sulfur dioxide -0.018863685196341816
total sulfur dioxide 0.07447982325062003
density -0.17305138620126106
pH 0.05808006453308803
sulphates 0.13876262568557934
alcohol 0.2947134691111974
intercept 5.6501294014064145
eta = 0.01 : iter = 728, loss = 0.24203354045966255, lambda = 0.00010000000000000002, TestErr = 0.45525344581852156
fixed acidity 0.25152952212309976
volatile acidity -0.03876889927769888
citric acid 0.14059421863669852
residual sugar 0.06793602828251821
chlorides -0.0607861479963043
free sulfur dioxide 0.08441853171277111
total sulfur dioxide -0.09642176480191654
density -0.2345690991118163
pH 0.1396740265674562
sulphates 0.1449843342292861
alcohol 0.19737851967044345
intercept 5.657998427200384
eta = 0.003 : iter = 1758, loss = 0.22475918775097103, lambda = 0.00010000000000000002, TestErr = 0.44693442950748147
fixed acidity 0.2953542653448508
volatile acidity -0.12934364893075953
citric acid 0.04629080083382285
residual sugar 0.013753852832452122
chlorides -0.03688613363045954
free sulfur dioxide 0.045541235818534614
total sulfur dioxide -0.049594638329345575
density -0.17427360277645224
pH 0.13897225246491407
sulphates 0.15425590075925466
alcohol 0.26518804857692096
intercept 5.597149258230254
It's shuffled randomly, so the results change with each run.
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (1) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (2) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (3) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (4) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (5) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (6) https://github.com/legacyworld/sklearn-basic
Recommended Posts