Since I was able to implement a good feeling in Chapter 6 of Deep Learning ① made from scratch, it is a memorandum. Jupyter will also be released, so I would appreciate it if you could point out any mistakes. In the book, I downloaded the dataset locally, but since sklearn has a dataset for learning such as mnist, I adjusted the code so that I only need to import from sklearn. [jupyter notebook for public @ github](https://github.com/fumitrial8/DeepLearning/blob/master/%E3%82%BB%E3%82%99%E3%83%AD%E3%81%8B% E3% 82% 89% E4% BD% 9C% E3% 82% 8BDeepLearning% 20% E7% AC% AC6% E7% AB% A0.ipynb)
A method of adjusting the weight of each network by subtracting the value obtained by multiplying the gradient of the loss function by a certain learning coefficient from the weight. Expressed as an expression
W (weight after adjustment) = W (weight before adjustment) --η * dL / dW (learning coefficient * gradient of loss function)
A method of adjusting the weight of each network by reducing the learning coefficient according to the progress of learning. Expressed as an expression
h (gradient history after adjustment) = h (gradient history before adjustment) --dL / dW * dL / dW (gradient squared loss function)
W (weight after adjustment) = W (weight before adjustment) --η * h *** (-1 / 2) * dL / dW (learning coefficient * gradient history * loss function gradient)
How to adjust the weight of each network by learning more as the gradient is larger and learning less when the gradient is smaller (I couldn't find a good expression ...). Expressed as an expression
v (History of weights after adjustment) = αv (History of weights before adjustment) --η * dL / dW (Learning coefficient * Gradient of loss function) It seems that α is usually set at 0.9.
W (weight after adjustment) = W (weight before adjustment) + v
Recommended Posts