I tried to understand the learning function of neural networks carefully without using a machine learning library (first half).

Introduction

Learning is a function in neural networks (deep learning). I tried to understand from scratch the calculations in the model that are being done to increase the predictive value of the predictive model.

This time as well, I referred to O'Reilly's deep learning textbook. It's very easy to understand. https://www.oreilly.co.jp/books/9784873117584/

The outline is as follows.

What is learning in neural networks

Learning in a model is to bring the predicted value closer to the correct answer or increase the correct answer rate. Take image recognition as an example. MNIST, a well-known number recognition system, distinguishes handwritten numbers. image.png

In this image, any human can see 5 (the brain learns and recognizes). Next, let's think about what is needed to create an algorithm that allows a computer to recognize this 5. In order to recognize 5 from the "image" of 5, it is necessary to find a "feature amount" that can be identified as 5 from the image. Feature quantity is written in English as Feature Selection. Translated literally, it means "choose a characteristic." If you replace it with the image of 5, it will have features such as "first horizontal bar", "vertical line", and "a circular arc with an open left at about 270 degrees". The flow of extracting these features and learning the extracted features is the algorithm that makes the computer recognize 5.   The function of finding (= extracting) features is called a converter. These well-known converters include SIFT, SURF, and HOG. For more information, please visit the URL below. This URL is the material for 2011, and seems to be a technology that has developed since the 2000s.

https://www.slideshare.net/lawmn/siftsurf

Next, you can use features to convert image data into a vector and train that vector with a function called a classifier used in machine learning. Well-known such classifiers are Support Vector Machine (SVM) and K-nearest neighbor method (KNN).

Here, it is necessary to judge and select the converter appropriately by "person" according to the characteristics. On the other hand, the range covered by the neural network includes this converter as well. In other words, the converter itself that searches for features is also an algorithm that can be trained.

image.png

The concept is illustrated above. By increasing the area that the computer judges, the neural network interprets the given data as it is and tries to find the pattern of the problem. It can be understood that it is an algorithm with a greater sense of artificial intelligence.

What is a loss function

Next, I will summarize the idea of distinguishing between the specifically predicted data and the correct data. We introduce a function called a loss function to indicate whether it is close to the correct answer.

Sum of squares error

The most well-known loss function is the mean squared error. It is expressed by the formula shown below.

image.png

yk indicates the output of the neural network, tk indicates the teacher data (correct answer data), and k indicates the number of dimensions (number) of the data. From the formula, we can see that the more correct answers, the smaller this value. I would like to write it easily in a program.

nn.ipynb


import numpy as np

def mean_squared_error(y,t):
    return 0.5*np.sum((y-t)**2)

t = [0,0,1,0,0,0,0,0,0,0]
y = [0.1,0.1,0.6,0.1,0.1,0,0,0,0,0]
y1 = [0.1,0.1,0.1,0.1,0.6,0,0,0,0,0]
print(mean_squared_error(np.array(y),np.array(t)))
print(mean_squared_error(np.array(y1),np.array(t)))
0.10000000000000003
0.6000000000000001

The elements of this array correspond to the numbers "0", "1", "2" in order from the first index. Where y is the output of the neural network. The value converted by the softmax function represents the probability. It is said that the probability of determining that it is the number 2 is 0.6. Furthermore, t is teacher data. In other words, the correct answer is the number 2. When the sum-of-squares error was calculated for each of y and y1, y was closer. It can be seen that the value output by y can properly express that the element with the number 2 has the highest probability.

Cross entropy error

Another error function is the cross entropy error. image.png

log is based on the natural logarithm. Since tk is the correct label, 1 is output only when the answer is correct. Therefore, this function is calculated to output the natural logarithm corresponding to the correct label of 1. Here is the result of the actual implementation.

nn.ipynb



def cross_entropy_error(y,t):
    delta = 1e-7
    return -np.sum(t*np.log(y+delta))

print(cross_entropy_error(np.array(y),np.array(t)))
print(cross_entropy_error(np.array(y1),np.array(t)))
0.510825457099338
2.302584092994546

Here, a small value (0.0000001) is added in the calculation in the log. This is added in order to prevent the calculation from getting stuck because it diverges to minus infinity when it becomes log (0). Looking at the result, if the y output of the correct label is small, it will be 2.3, but if the y output is high, it will be 0.5.

Purpose of setting the error function

The loss function can be made into a model with high prediction accuracy by minimizing the value obtained. Therefore, it is necessary to find a parameter that reduces the loss function. At this time, the parameter is updated using the differentiated value of this parameter as a clue. Differentiation allows you to know the gradient of the function. The basic contents related to differentiation are omitted here.

image.png

If the value of this gradient is positive, moving the parameter (a in the figure) in the negative direction will bring it closer to the minimum value. On the contrary, if the gradient value is negative, you can imagine moving the parameter in the positive direction to approach the minimum value.

Implement the differential of a function

Now, I would like to think about the differentiation of functions. There are two approaches to differentiating a function: (1) solving it analytically and (2) solving it discretely (taking a difference). If you move your hand and do it by a human, you do it in (1), but in solving it programmatically, (2) is convenient. This time, we will implement the concept of central difference shown in the figure below.

This time, I would like to find the value obtained by differentiating this function discretely.

image.png

nn.ipynb


import numpy as np
import matplotlib.pyplot as plt

def numerical_diff(f,x):
    h =1e-4 #0.0001
    return (f(x+h)-f(x-h))/(2*h)

def function_1(x):
    return 0.01*x**2 + 0.1*x
numerical_diff(function_1,5)
0.1999999999990898

image.png

The attached curve is the original function, and the straight line is the gradient at x = 5.

Implement partial differential

Next, consider performing partial differentiation of the two-variable function shown below.

image.png

If you draw the original function, it will be a 3D graph as shown below.

image.png

nn.ipynb


def function_2(x):
return x[0]**2 + x[1]**2

Partial differentiation refers to determining the variable to be differentiated and treating other numerical values as constants to perform differentiation. Partially differentiate x0 and find the value when x0 = 3, x1 = 4.

nn.ipynb


def function_tmp1(x0):
    return x0*x0 +4.0**2.0

numerical_diff(function_tmp1,3.0)
6.00000000000378

We define it as a function with only one variable and differentiate that function. However, in this case, it is necessary to perform processing such as assigning values other than the values that are used as variables one by one. Consider that you want to differentiate x0 and x1 together. This can be implemented as follows:

nn.ipynb


def numerical_gradient(f,x):
    h =1e-4
    grad = np.zeros_like(x)
    
    for idx in range(x.size):
        tmp_val =x[idx]
        x[idx] =tmp_val + h
        fxh1 = f(x)
        
        x[idx] = tmp_val -h
        fxh2 = f(x)
        
        grad[idx] = (fxh1-fxh2)/(2*h)
        x[idx] = tmp_val
    
    return grad

numerical_gradient(function_2,np.array([3.0,4.0]))
array([6., 8.])

I explained earlier that this differentiated value indicates the gradient of the original function. Furthermore, consider drawing this differentiated value as a vector. For convenience, it is shown below with a minus sign.

image.png

You can see that the arrow points to (x0, x1) = (0,0). This leads to improving the accuracy of the model by finding the minimum value in the discussion of the loss function. ** It turns out that this differential operation can find the minimum value of the loss function, leading to model optimization! ** **

At the end

This time, I have advanced to the point of understanding that this differential operation leads to improvement of the accuracy of the model. By looking at the contents of learning, which is the heart of neural networks, I deepened my understanding. In the next and second half of the article, I would like to carefully understand the learning by actually proceeding to the implementation on the neural network.

The second half is here. https://qiita.com/Fumio-eisan/items/7507d8687ca651ab301d

Recommended Posts

I tried to understand the learning function of neural networks carefully without using a machine learning library (first half).
I tried to understand the learning function in the neural network carefully without using the machine learning library (second half).
I tried to compare the accuracy of machine learning models using kaggle as a theme.
I tried to compress the image using machine learning
[Python] Deep Learning: I tried to implement deep learning (DBN, SDA) without using a library.
(Machine learning) I tried to understand the EM algorithm in a mixed Gaussian distribution carefully with implementation.
[Machine learning] I tried to summarize the theory of Adaboost
I tried to get the index of the list using the enumerate function
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Introduction ~
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Implementation ~
I tried using the trained model VGG16 of the deep learning library Keras
(Machine learning) I tried to understand Bayesian linear regression carefully with implementation.
I tried to visualize the model with the low-code machine learning library "PyCaret"
I tried the common story of using Deep Learning to predict the Nikkei 225
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Battle Edition ~
I tried to predict the presence or absence of snow by machine learning.
A quick introduction to the neural machine translation library
I tried to approximate the sin function using chainer
I tried to understand the support vector machine carefully (Part 1: I tried the polynomial / RBF kernel using MakeMoons as an example).
I tried to notify the update of "Become a novelist" using "IFTTT" and "Become a novelist API"
I made a function to check the model of DCGAN
[TF] I tried to visualize the learning result using Tensorboard
I tried a little bit of the behavior of the zip function
I tried to fight the Local Minimum of Goldstein-Price Function
I tried to approximate the sin function using chainer (re-challenge)
Matching app I tried to take statistics of strong people & tried to create a machine learning model
I tried to verify the yin and yang classification of Hololive members by machine learning
I tried to automate the construction of a hands-on environment using IBM Cloud's SoftLayer API
A story stuck with the installation of the machine learning library JAX
I tried to get a database of horse racing using Pandas
I tried to make a regular expression of "amount" using Python
I tried to make a regular expression of "time" using Python
I tried to make a regular expression of "date" using Python
I tried to get a list of AMI Names using Boto3
How to use machine learning for work? 01_ Understand the purpose of machine learning
I want to create a machine learning service without programming! WebAPI
I tried to understand supervised learning of machine learning in an easy-to-understand manner even for server engineers 1
I tried to understand supervised learning of machine learning in an easy-to-understand manner even for server engineers 2
I tried to transform the face image using sparse_image_warp of TensorFlow Addons
I tried to get the batting results of Hachinai using image processing
Record the steps to understand machine learning
I tried to estimate the similarity of the question intent using gensim's Doc2Vec
I don't want to admit it ... The dynamical representation of Neural Networks
A super introduction to Django by Python beginners! Part 3 I tried using the template file inheritance function
I tried how to improve the accuracy of my own Neural Network
I tried to classify guitar chords in real time using machine learning
A super introduction to Django by Python beginners! Part 2 I tried using the convenient functions of the template
I tried to understand the decision tree (CART) that makes the classification carefully
A beginner of machine learning tried to predict Arima Kinen with python
I tried to perform a cluster analysis of customers using purchasing data
I tried to display the altitude value of DTM in a graph
Using COTOHA, I tried to follow the emotional course of Run, Melos!
I tried to verify the result of A / B test by chi-square test
Python: I want to measure the processing time of a function neatly
I want to create a machine learning service without programming! Text classification
I tried the common story of predicting the Nikkei 225 using deep learning (backtest)
I made a function to see the movement of a two-dimensional array (Python)
I tried to predict the deterioration of the lithium ion battery using the Qore SDK
I tried to notify the update of "Hamelin" using "Beautiful Soup" and "IFTTT"
[Python] I tried to judge the member image of the idol group using Keras
I tried to organize the evaluation indexes used in machine learning (regression model)