This article is an easy-to-understand output of Deep Learning Chapter 7 Learning Techniques Made from Zero. I was able to understand it myself in the humanities, so I hope you can read it comfortably. Also, I would be more than happy if you could refer to it when studying this book.
class Momentum:
def __init__(self, lr=0.01, momentum=0.9):
self.lr = lr #Learning rate
self.momentum = momentum #Momentum constant
self.v = None #speed
def update(self, params, grads):
if self.v is None: #Initialize the velocity of each parameter only at the beginning
self.v = {}
for key,val in params.items():
self.v[key] = np.zeros_like(val) #Initialize by putting zero in the velocity of each parameter
for key in params.keys():
self.v[key] = self.momentum *self.v[key] - self.lr * grads[key] #Find the speed at the current location
params[key] = params[key] + self.v[key]
The Momentum method uses the concept of velocity, so first create the velocity with an instance variable.
Find the velocity at the current point from the gradient and add it to the current weighting parameters to update the parameters.
class AdaGrad: #Attenuation of learning coefficient can be performed for each parameter
def __init__(self, lr=0.01):
self.lr = lr
self.h = None
def update(self, params, grads):
if self.h is None:
self.h = {}
for key,val in params.items():
self.h[key] = np.zeros_like(val)
for key in params.keys():
self.h[key] = self.h[key] + (grads[key] * grads[key]) #Put the sum of squares of the gradients of each parameter into h
params[key] = params[key] - ((self.lr * grads[key] )/ (np.sqrt(self.h[key]) + 1e-7))
As for the AdaDrad method, there is no need to explain it because it just implements the formula written in the previous article.
Gradually reduce the learning factor and subtract like SGD.
Recommended Posts