Last time University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (15) https://github.com/legacyworld/sklearn-basic
Commentary on Youtube: 8th (1) per 27 minutes I couldn't get the result explained in the lecture, probably because the implementation of the re-descent method was bad. I tried it with ridge regression etc., but the result did not change much.
Mathematically, it implements the following:
E(w) = -\frac{1}{N}\sum_{n=1}^{N}{t_n\,\log\hat t_n + (1-t_n)\,\log(1-\hat t_n)}\\
\frac{\partial E(w)}{\partial w} = X^T(\hat t-t) \\
w \leftarrow w - \eta X^T(\hat t-t)
In the iris data, $ N = 150 $ and $ w $ are 5 dimensions by adding the intercept. The reason why $ E (w) $ is divided by $ N $ is that the initial cost does not match unless this is done. However, even with this, only 0.1 diverged, and other than that, it converged properly. I think something is wrong, but I'm not sure.
\nabla \nabla E(w) = X^TRX\,,R=\hat{t}_n(1-\hat{t}_n)Diagonal matrix\\
w \leftarrow w-(X^TRX)^{-1}X^T(\hat{t}-t)
The Newton method is generally correct, but the Final Cost is unfortunately quite different. It is also difficult to understand that only three weights are displayed in the results of the lecture.
Click here for source code Since the source code of Exercise 4.3 is diverted, it is classified by BaseEstimator, but there is no meaning there.
python:Homework_7.3.py
#Challenge 7.3 Comparison of gradient method and Newton's method in logistic regression
#Commentary on Youtube: 8th(1)Per 27 minutes
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing,metrics
from sklearn.linear_model import LogisticRegression
from sklearn.base import BaseEstimator
import statsmodels.api as sm
from sklearn.datasets import load_iris
iris = load_iris()
class MyEstimator(BaseEstimator):
def __init__(self,ep,eta):
self.ep = ep
self.eta = eta
self.loss = []
def fit(self, X, y,f):
m = len(y)
loss = []
diff = 10**(10)
ep = self.ep
#Types of features
dim = X.T.shape[1]
#Initial value of beta
beta = np.zeros(dim).reshape(-1,1)
eta = self.eta
while abs(diff) > ep:
t_hat = self.sigmoid(beta.T,X)
loss.append(-(1/m)*np.sum(y*np.log(t_hat) + (1-y)*np.log(1-t_hat)))
#The steepest descent method
if f == "GD":
beta = beta - eta*np.dot(X,(t_hat-y).reshape(-1,1))
#Newton's method
else:
#Diagonal matrix of NxN
R = np.diag((t_hat*(1-t_hat))[0])
#Hessian matrix
H = np.dot(np.dot(X,R),X.T)
beta = beta - np.dot(np.linalg.inv(H),np.dot(X,(t_hat-y).reshape(-1,1)))
if len(loss) > 1:
diff = loss[len(loss)-1] - loss[len(loss)-2]
if diff > 0:
break
self.loss = loss
self.coef_ = beta
return self
def sigmoid(self,w,x):
return 1/(1+np.exp(-np.dot(w,x)))
#Graph
fig = plt.figure(figsize=(20,10))
ax = [fig.add_subplot(3,3,i+1) for i in range(9)]
#Just consider whether virginica or not
target = 2
X = iris.data
y = iris.target
# y =Not 2(Not virginica)If 0
y[np.where(np.not_equal(y,target))] = 0
y[np.where(np.equal(y,target))] = 1
scaler = preprocessing.StandardScaler()
X_fit = scaler.fit_transform(X)
X_fit = sm.add_constant(X_fit).T #Add 1 to the first column
epsilon = 10 ** (-8)
#The steepest descent method
eta_list = [0.1,0.01,0.008,0.006,0.004,0.003,0.002,0.001,0.0005]
for index,eta in enumerate(eta_list):
myest = MyEstimator(epsilon,eta)
myest.fit(X_fit,y,"GD")
ax[index].plot(myest.loss)
ax[index].set_title(f"Optimization with Gradient Descent\nStepsize = {eta}\nIterations:{len(myest.loss)}; Initial Cost is:{myest.loss[0]:.3f}; Final Cost is:{myest.loss[-1]:.6f}")
plt.tight_layout()
plt.savefig(f"7.3GD.png ")
#Newton's method
myest.fit(X_fit,y,"newton")
plt.clf()
plt.plot(myest.loss)
plt.title(f"Optimization with Newton Method\nInitial Cost is:{myest.loss[0]:.3f}; Final Cost is:{myest.loss[-1]:.6f}")
plt.savefig("7.3Newton.png ")
#Results from sklearn's Logistic Regression
X_fit = scaler.fit_transform(X)
clf = LogisticRegression(penalty='none')
clf.fit(X_fit,y)
print(f"accuracy_score = {metrics.accuracy_score(clf.predict(X_fit),y)}")
print(f"coef = {clf.coef_} intercept = {clf.intercept_}")
In the lecture, the step parameter diverged up to 0.003 and became the minimum at 0.002, but the result was completely different.
Final Cost is one digit smaller, but the number of times is about the same as the lecture. It doesn't seem that wrong.
accuracy_score = 0.9866666666666667
coef = [[-2.03446841 -2.90222851 16.58947002 13.89172352]] intercept = [-20.10133936]
Obtained parameters are as follows The re-descent method is the result when the final cost is the smallest step size = 0.01
The steepest descent method:(w_0,w_1,w_2,w_3,w_4) = (-18.73438888,-1.97839772,-2.69938233,15.54339092,12.96694841)\\
Newton's method:(w_0,w_1,w_2,w_3,w_4) = (-20.1018028,-2.03454941,-2.90225059,16.59009858,13.89184339)\\
sklearn:(w_0,w_1,w_2,w_3,w_4) = (-20.10133936,-2.03446841,-2.90222851,16.58947002,13.89172352)
The Newton's method is certainly fast, but inverse matrix calculation is essential, so if the number of dimensions or the number of samples increases, will it eventually settle into the stochastic re-descent method?
University of Tsukuba Machine Learning Course: Study and strengthen sklearn while creating the Python script part of the task (1) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (2) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (3) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (4) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (5) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (6) University of Tsukuba Machine Learning Course: Study sklearn while making the Python script part of the task (7) Make your own steepest descent method University of Tsukuba Machine Learning Course: Study sklearn while making the Python script part of the task (8) Make your own stochastic steepest descent method University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (9) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (10) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (11) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (12) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (13) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (14) https://github.com/legacyworld/sklearn-basic https://ocw.tsukuba.ac.jp/course/systeminformation/machine_learning/
Recommended Posts