I started deep learning. This time, I will simulate the ** Stochastic Gradient Descent (SGD) method with Jupyter Notebook. ** **
The simple gradient descent method calculates the slope from all the data and updates the weights, so once you fall into a local solution, it becomes difficult to get out of it and it takes time to calculate. Stochastic Gradient Descent (SGD) randomly extracts a part of the data, calculates the slope and updates the weight, so the slope calculation fluctuates well and overcomes the local solution to reach the more optimal solution. It's easier and less time consuming to calculate.
I found it very interesting that this ** sloppy blur is a means to reach a more optimal solution **, so this time I would like to simulate Stochastic Gradient Descent (SGD) with Jupyter Notebook. think.
This time, for the sake of simplicity, we will use one weight. Take 11 x and y coordinates and approximate with a 6-dimensional polynomial.
import numpy as np
import matplotlib.pyplot as plt
#Data (for polynomial creation)
x = np.array([-5.0, -4.0, -3.0, -2.0, -1.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0])
y = np.array([ 5.0, 1.5, 2.0, 1.5, 0.0, -3.0, -1.0, 2.0, 3.0, 2.5, 5.0])
#Polynomial creation (6 dimensions)
p = np.poly1d(np.polyfit(x, y, 6))
print(p)
#View data and polynomials
xp = np.linspace(-10, 10, 100)
plt.plot(x, y, '.', xp, p(xp), '')
plt.xlim(-7, 7)
plt.ylim(-5, 10)
plt.show()
Based on the obtained polynomial, find y when x is divided into 100 parts from -10 to 10 and changed. In reality, the observed value should be noisy, so we add a random number from 0 to 0.2 to y.
#Create 100 points of data from polynomial (0 to 0).Add 2 random numbers)
x_add, y_add =[], []
for i in np.linspace(-10, 10, 100):
x_add.append(i)
y_add.append( p(i) + np.random.normal(0, 0.2))
#Display the created data
plt.scatter(x_add, y_add, alpha=0.5)
plt.xlim(-7, 7)
plt.ylim(-5, 10)
plt.show()
We have created data (100 points) with local solutions around x = -4, 4 and optimal solutions around x = 0.
This is the main part of the code. Use train_test_split
to randomly sample 10 points from 100 points of data.
Based on only the data of the 10 points, approximate with a 6-dimensional polynomial, find the derivative with d_y = p.deriv ()
, calculate the slope, and update the weight.
Do this one screen at a time and animate it with matplotlib animation.
from sklearn.model_selection import train_test_split
from matplotlib import pylab
from matplotlib import animation, rc
#Setting
rc('animation', html='jshtml')
w = np.array([-2.])
#Random sampling function (sampling 100 to 10 points)
def random_sampling():
X_train, X_test, y_train, y_test = train_test_split(x_add, y_add, test_size=0.90)
_x = X_train
_y = y_train
return _x, _y
#1 screen creation function
def animate(frame, w, alpha):
_x, _y = random_sampling()
p = np.poly1d(np.polyfit(_x, _y, 6))
plt.plot(_x, _y, '.',
xp, p(xp), '')
d_y = p.deriv()
plt.clf()
plt.plot(xp, p(xp), '-', color='green')
plt.plot(w, p(w), '.', color='red', markersize=20)
plt.xlim(-7, 7)
plt.ylim(-5, 10)
grad = d_y(w)
w -= alpha * grad
#Animation creation function
def gradient_descent(alpha, w):
fig, ax = plt.subplots(111)
if type(w) is list:
w = np.array(w, detype=np.float32)
anim = animation.FuncAnimation(fig, animate, fargs=(w, alpha), frames=100, interval=300)
return anim
Now, let's run the simulation with the learning rate alpha = 0.3 and the initial weight x = 3.5.
#Learning rate 0.3, initial value of weight 3.Run in 5
gradient_descent(alpha=0.3, w=np.array([3.5]))
When you execute the code, the following display will appear, so please play it with the ▶ ︎ button. It's probabilistic, so it may not work, but if you try it a few times, you'll see something sloppy. It is interesting to play with various parameters.
Here is an example of how it worked (learning rate alpha = 0.3, initial weight X = 3.5, loop playback). With a good sloppy slope calculation, we have arrived at the optimal solution X = 0, not just the local solution X = 4.
Recommended Posts