I tried to understand the learning function in the neural network carefully without using the machine learning library (second half).

Introduction

Learning is a function in neural networks (deep learning). I tried to understand from scratch the calculations in the model that are being done to increase the predictive value of the predictive model. I implemented it without using a machine learning library.

In the previous article, I summarized the implications of learning in neural networks, the loss function required to improve the accuracy of models, and the concept of differentiation. https://qiita.com/Fumio-eisan/items/c4b5b7da5b5976d09504   This time, I would like to summarize the latter half of the implementation to the neural network.

This time as well, I referred to O'Reilly's deep learning textbook. It's very easy to understand. https://www.oreilly.co.jp/books/9784873117584/

The outline is as follows.

About gradient descent

In the previous article, we confirmed that it is necessary to minimize the loss function in order to optimize the model. We have also shown that differential of a function is the means to minimize it. Now, let's think about optimizing the parameters of the model by actually using the derivative of this function.

By differentiating a function, you can know the direction in which the value of that function decreases. The gradient method advances a certain distance in the gradient direction from the current location. And it means to find the same gradient at the destination and proceed in the direction of the gradient. ** Going toward the minimum value is called the gradient descent method, and going toward the maximum value is called the gradient descent method. ** **

image.png

The above is a mathematical expression of the gradient method. η represents the amount of updates and is called the ** learning rate **. It shows how many parameters are updated in one learning. If this learning rate is too small, it will take time to approach the minimum value. On the contrary, if the learning rate is large, the minimum value may be exceeded. Therefore, you need to find the right value for each model.   I will actually implement it. This is the function I used in the first half. image.png

I would like to find the minimum value of this function.

nn.ipynb


def gradient_descent(f,init_x, lr=0.01, step_num=100):
    x = init_x
    
    for i in range(step_num):
        grad = numerical_gradient(f,x)
        x -= lr*grad
    
    return x

def function_2(x):
    return x[0]**2+x[1]**2

Now, let the initial value be (x0, x1) = (-3,4) and use the gradient method to find the minimum value. The true minimum value is taken when (0,0).

nn.ipynb


init_x = np.array([-3.0,4.0])
gradient_descent(function_2, init_x = init_x, lr =0.1, step_num=100)
array([-6.11110793e-10,  8.14814391e-10])

When the learning rate lr is 0.1, the above result is obtained and it is found that the value is almost (0,0). In this case, it can be said that the learning was successful.

nn.ipynb


init_x = np.array([-3.0,4.0])
gradient_descent(function_2, init_x = init_x, lr =10, step_num=100)
array([-2.58983747e+13, -1.29524862e+12])

Next, here is the case where the learning rate is set to 10. The value has diverged. You can see that this is not a good learning experience. This study shows that the optimal learning rate must be set for each model.

Gradient in neural network

Apply the above method for finding the gradient to a neural network. In neural networks, it is applied to the gradient of the loss function. Let L be the loss function, and take a structure that partially differentiates with the weight w.

image.png

Implement a neural network

I would like to implement a neural network that performs the following procedure.

  1. Mini batch Some data is randomly extracted from the training data (mini-batch). The goal is to minimize the loss function in that mini-batch.

  2. Gradient calculation Find the gradient of the weight parameter to reduce the loss function of the mini-batch.

  3. Parameter update Updates the weight parameter by a small amount in the negative gradient direction.

  4. Repeat Repeat steps 1 to 3 as many times as you like.

Program configuration (module loading, etc.)

Now, I would like to implement a two-layer neural network that actually has a learning function. The component diagram of the model implemented this time is as follows.   image.png

When setting the parameters mainly, it is decided by nn.ipynb. Also, since this time we will use the MNIST dataset, it will be read from the original URL of Mr. Yann et al. The calculation performed by the actual neural network is described on two_layer_net.py. The network is as shown in the figure below.

image.png

Since MNIST is originally 28 × 28 pixel image data, 28 × 28 = 784 dimensional numbers are in the first input layer. This time, the hidden layer is set to 100, and the final output layer is set to 10 dimensions to spit out as 10 types of numbers. In performing this calculation, the activation function sigmoid function and the softmax function for calculating the final probability can be calculated by reading the functions described in yet another functions.py. Substitute the output value obtained there and the correct index of the teacher data into the loss function. Then, calculate with the numerical_garadient method described in gradient.py to find the gradient of the weight parameter. In order to update the obtained gradient to the next weight parameter, it is described to update on the original nn.ipynb. The series of operations is repeated for the number of times.

Even if you just calculate and train a two-layer neural network, you need to read and calculate this many methods. You can see that it is not the amount that can be done by humans. You will also find that you need to understand the structure of your program, including classes and methods.

Now let's take a look at the updated parts of the weight parameters and bias.

nn.ipynb


    #Update weight parameters and bias
    for key in ('W1', 'b1', 'W2', 'b2'):
        network.params[key] -= learning_rate * grad[key] #The point is that this sign is negative
    
    loss = network.loss(x_batch, t_batch)
    train_loss_list.append(loss)
    

The point is that the weight parameter and bias W1, W2, b1 and b2 are multiplied by the gradient (grad [key]) and the learning rate, respectively, and then ** subtracted **. When the gradient obtained by differentiation is a positive value, moving in the negative direction means that it is subtracted from the minimum value. What if you try to reverse this sign and make it positive?

006.png

The horizontal axis is the number of calculations and the vertical axis is the value of the loss function. You can see that the value has risen immediately. If you write the sign as a minus, it will be like this.

007.png

You can see that the value goes down properly. Now you can build a two-layer neural network without using the existing library of machine learning.

At the end

This time, we modeled the calculation and learning of neural networks without using a machine learning library. In learning the model, I understood that the idea of the loss function and the gradient (= differential of the function) are the points. I also needed to combine class modules well to perform calculations, which was a learning experience for Python itself.

The full program is stored here. https://github.com/Fumio-eisan/nn2layer_20200321

Recommended Posts

I tried to understand the learning function in the neural network carefully without using the machine learning library (second half).
I tried to understand the learning function of neural networks carefully without using a machine learning library (first half).
I tried to compress the image using machine learning
(Machine learning) I tried to understand the EM algorithm in a mixed Gaussian distribution carefully with implementation.
I tried to understand it carefully while implementing the algorithm Adaboost in machine learning (+ I deepened my understanding of array calculation)
I tried to classify guitar chords in real time using machine learning
(Machine learning) I tried to understand Bayesian linear regression carefully with implementation.
I tried to visualize the model with the low-code machine learning library "PyCaret"
[Python] Deep Learning: I tried to implement deep learning (DBN, SDA) without using a library.
I tried to predict the change in snowfall for 2 years by machine learning
I tried to approximate the sin function using chainer
I tried to understand the support vector machine carefully (Part 1: I tried the polynomial / RBF kernel using MakeMoons as an example).
Try building a neural network in Python without using a library
I tried to implement the mail sending function in Python
[TF] I tried to visualize the learning result using Tensorboard
[Machine learning] I tried to summarize the theory of Adaboost
I tried to approximate the sin function using chainer (re-challenge)
I tried to get the index of the list using the enumerate function
I made my own 3-layer forward propagation neural network and tried to understand the calculation deeply.
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Introduction ~
I tried to understand supervised learning of machine learning in an easy-to-understand manner even for server engineers 1
I tried to understand supervised learning of machine learning in an easy-to-understand manner even for server engineers 2
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Implementation ~
Record the steps to understand machine learning
I tried how to improve the accuracy of my own Neural Network
I tried using the trained model VGG16 of the deep learning library Keras
I tried to understand the decision tree (CART) that makes the classification carefully
I tried the common story of using Deep Learning to predict the Nikkei 225
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Battle Edition ~
I tried Hello World with 64bit OS + C language without using the library
I tried to predict the presence or absence of snow by machine learning.
I tried to implement various methods for machine learning (prediction model) using scikit-learn.
I tried to process and transform the image and expand the data for machine learning
I tried using the functional programming library toolz
Introduction to AI creation with Python! Part 2 I tried to predict the house price in Boston with a neural network
[Linux] I learned LPIC lv1 in 10 days and tried to understand the mechanism of Linux.
I tried to extract the text in the image file using Tesseract of the OCR engine
I tried to learn the sin function with chainer
I tried to move machine learning (ObjectDetection) with TouchDesigner
I tried to graph the packages installed in Python
I tried to summarize four neural network optimization methods
I tried to identify the language using CNN + Melspectogram
I tried to complement the knowledge graph using OpenKE
[Deep learning] Investigating how to use each function of the convolutional neural network [DW day 3]
[Deep Learning from scratch] I tried to explain the gradient confirmation in an easy-to-understand manner.
I tried to verify the yin and yang classification of Hololive members by machine learning
I also tried to imitate the function monad and State monad with a generator in Python
I tried to find the affine matrix in image alignment (feature point matching) using affine transformation.
I tried to predict the genre of music from the song title on the Recurrent Neural Network
I dare to fill out the form without using selenium
I tried to implement a basic Recurrent Neural Network model
I tried using the Python library from Ruby with PyCall
I tried to simulate ad optimization using the bandit algorithm.
I installed the automatic machine learning library auto-sklearn on centos7
Neural network to understand and implement in high school mathematics
I tried using Tensorboard, a visualization tool for machine learning
I tried to summarize the code often used in Pandas
I tried machine learning to convert sentences into XX style
I tried to illustrate the time and time in C language
I wrote it in Go to understand the SOLID principle
I tried to summarize the commands often used in business