Summary Note on Deep Learning -4.2 Loss Function-

Summary of explanation

The purpose here is to interpret what I couldn't understand just by reading a book while studying Deep Learning, and to remember it smoothly when I look back at it later. I will explain the contents of the code as carefully as possible, so I hope it will be helpful.

4.2 Loss function

In order to improve the performance in neural network learning, it is necessary to approach the optimum weight parameter. Use this loss function as a search clue. First of all, what is output from the loss function       0.6094374124342252       0.4750000000000001  It is a numerical value such as. This value is small when the performance is good, and large when the performance is bad. In this example, the value below is smaller, so it can be said that the performance is higher. The value of the output of the loss function is used as a clue to refer to the direction and magnitude of updating the weight parameter.

Types of loss functions

There are various loss functions, but here we will explain the sum of squares error and the cross entropy error.

1. Mean Squared Error

The sum of squares error is calculated by the following formula.

E = \frac{1}{N}\sum_{i=1}^{N}(y_i -t_i)^2

Can be represented by. To explain the formula, the difference between the output value data (y i </ sub>) and the correct value data (t i </ sub>) is squared and averaged by N. The square is to make the error a positive value. If you want a positive value, take the absolute value

E = \frac{1}{N}\sum_{i=1}^{N}|y_i -t_i|

Is it okay to do this? I thought, but apparently it's easier to square when calculating the derivative. I see! There are cases where the derivative of the absolute value is divided ...   Furthermore, when differentiated, 2 comes out in front, so add 1/2.

E = \frac{1}{2}*\frac{1}{N}\sum_{i=1}^{N}(y_i -t_i)^2

It seems that it may be.

Example using sum of squares error

This time, let's define the function with N in the above formula as 1 and see the result. y is the output result of the Softmax function.

import numpy as np

#Correct answer data(one-hot-label)
t = [0,0,1,0,0]

#Define a function of sum of squares error
def mean_squared_error(y,t):
    return 0.5 * np.sum((y-t)**2)

#Pattern 1(Close to correct data)
y1 = [0.01,0.02,0.9,0.05,0.02]
#Pattern 2(Far from correct data)
y2 = [0.5,0.1,0.2,0.2,0.1]

out1 = mean_squared_error(np.array(y1),np.array(t))
out2 = mean_squared_error(np.array(y2),np.array(t))

Each result is print(out1) >>> 0.006699999999999998 print(out2) >>> 0.4750000000000001 The error was small when it was close to the correct answer data, and large when it was far from it. Therefore, in this case, the sum of squares error indicates that the output result of pattern 1 is more suitable for the teacher data.

2. Cross Entropy Error

The cross entropy error is calculated by the following formula.

E = -\sum_{k}t_klog_e{y_k}

Can be represented by.

The difference from the sum of squares error is that the output data and the correct answer data are multiplied. To explain what the benefits of this are The correct answer data is a one-hot expression, and only the correct answer label is 1, and the others are 0. So when applied to the above formula, the value of E is

** Correct label only -log y k </ sub> </ sub> ** Otherwise 0

Do you know that As a result, the cross entropy error is determined by the output result of the correct label. If the output label corresponding to the correct label is ** small **, the value of E will be large, indicating that the error is large.

Example using cross entropy error

We will define the function in the same way as for the sum of squares error, Before that, I will explain about the delta defined in the code.

As you can see from the graph of y = logx, when x-> 0, lim y becomes ** negative ∞ **. If the output label corresponding to the correct label is ** 0 **, the cross entropy error cannot be expressed numerically, and the calculation cannot proceed any further.

In order to avoid this, a minute value delta (10 -7 </ sup> in the code) is inserted to prevent the log contents from becoming 0.

import numpy as np

#Correct answer data(one-hot-label)
t = [0,0,1,0,0]

#Define a function of cross entropy error
def cross_entropy_error(y,t):
    #Define delta(Be careful not to open the space!)
    delta = 1e-7
    return -np.sum(t * np.log(y + delta))

#Pattern 1(Close to correct data)
y1 = [0.01,0.02,0.9,0.05,0.02]
#Pattern 2(Far from correct data)
y2 = [0.5,0.1,0.2,0.2,0.1]

out1 = cross_entropy_error(np.array(y1),np.array(t))
out2 = cross_entropy_error(np.array(y2),np.array(t))

Each result is print(out1) >>> 0.1053604045467214 print(out2) >>> 1.6094374124342252 Did you know that the closer to the correct answer data, the smaller the value, as in the case of the sum of squares error?

Summary

--The loss function is an important index for updating parameters (weights and biases). --The smaller the output value of the loss function, the closer to the optimum parameter.


Reference book

[Deep Learning from scratch-Theory and implementation of deep learning learned with Python (Japanese)](https://www.amazon.co.jp/%E3%82%BC%E3%83%AD%E3%81] % 8B% E3% 82% 89% E4% BD% 9C% E3% 82% 8BDeep-Learning-% E2% 80% 95Python% E3% 81% A7% E5% AD% A6% E3% 81% B6% E3% 83% 87% E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0% E3% 81% AE% E7% 90% 86% E8% AB% 96% E3% 81% A8% E5% AE% 9F% E8% A3% 85-% E6% 96% 8E% E8% 97% A4 -% E5% BA% B7% E6% AF% 85 / dp / 4873117585 / ref = sr_1_1? __ mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & crid = W6DVSLVW0BUS & dchild = 1 & keywords =% E3% 82% BC% E3% 83% AD% E3% 81% 8B% E3% 82% 89% E4% BD% 9C% E3% 82% 8Bdeep + learning & qid = 1597943190 & sprefix =% E3% 82% BC % E3% 83% AD% E3% 81% 8B% E3% 82% 89% 2Caps% 2C285 & sr = 8-1)]

Recommended Posts

Summary Note on Deep Learning -4.2 Loss Function-
Summary Note on Deep Learning -4.3 Gradient Method-
Deep learning / softmax function
Introduction to Deep Learning ~ Localization and Loss Function ~
Introduction to Deep Learning ~ Function Approximation ~
Deep Learning
"Python Machine Learning Programming" Summary Note (Jupyter)
Accelerate Deep Learning on Raspberry Pi 4 CPU
Deep Learning technology go leela on linux
Python vs Ruby "Deep Learning from scratch" Chapter 4 Implementation of loss function
[Python] Learning Note 1
Chainer and deep learning learned by function approximation
Good book "Deep Learning from scratch" on GitHub
Deep learning / error back propagation of sigmoid function
Deep Learning Memorandum
Start Deep learning
[Note] Python, when starting machine learning / deep learning [Links]
Python Deep Learning
Deep learning × Python
Python vs Ruby "Deep Learning from scratch" Summary
Why Deep Metric Learning based on Softmax functions works
Deep Learning 2 Made from Zero Natural Language Processing 1.3 Summary
Let's Deep Learning on Windows! (VS2013 + caffe + CUDA7.5 + cudnn5.1)
Deep learning course that can be crushed on site
Machine learning tutorial summary
First Deep Learning ~ Struggle ~
Python: Deep Learning Practices
Deep learning / activation functions
Deep Learning from scratch
Machine learning ⑤ AdaBoost Summary
Deep learning 1 Practice of deep learning
Deep learning / cross entropy
First Deep Learning ~ Preparation ~
Python function argument summary
First Deep Learning ~ Solution ~
[AI] Deep Metric Learning
I tried deep learning
Python: Deep Learning Tuning
Deep learning large-scale technology
Summary for learning RAPIDS
An amateur stumbled in Deep Learning from scratch Note: Chapter 1
"Learn while making! Development deep learning by PyTorch" on Colaboratory.
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 5
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 2
An amateur stumbled in Deep Learning from scratch Note: Chapter 3
An amateur stumbled in Deep Learning from scratch Note: Chapter 7
An amateur stumbled in Deep Learning from scratch Note: Chapter 5
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 7
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 1
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 4
An amateur stumbled in Deep Learning from scratch Note: Chapter 4
Deep Learning with Shogi AI on Mac and Google Colab
An amateur stumbled in Deep Learning from scratch Note: Chapter 2
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 6