I'm not sure, but I feel like I understand Deep Learning (I tried Deep Learning from scratch)

Introduction

During the holidays of GW, I suddenly thought about "Deep Learning from scratch-Theory and implementation of deep learning learned with Python" and studied and implemented it, so I made a note of my impressions and my own understanding. For those who will pick up this book and start studying, I hope it will be helpful for understanding and how much learning cost should be expected, but since it is a memo for me to look back and remember when time passes, I read it. There may be some difficult parts. .. .. I think that detailed mathematical formulas should be followed in detail by looking at the reference book, so I am writing while thinking that I should feel like I understand the outline like "Deep Learning is such a mechanism".

By understanding this book and extending the attached sample code a little, We were able to create a system that discriminates numbers in image files of numbers handwritten with "Paint". (* The source code is not released because it seems that copyright issues will be involved.)

Roughly about Deep Learning

Deep Learning is a mix of "identification" and "learning" functions.

"Identification" means input value , neural network function , output value There is font>, input value and known neural network function to output value < Ask for/font>. (Example: Read an image that you do not know what the number is and recognize it as 5)

"Learning" is <font color = "" from test data ( input value ) for which the correct answer ( output value ) is known. Find the "Green"> neural network function . (Example: Load images of various handwritten numbers 5 and update neural network function so that the recognition result is 5)

In addition, it should be noted In the "identification" function, __activation function __ (Chapter 2 and 3), __Loss function __, __gradient descent , __ error backpropagation are important in the "learning" function (Chapter 4 and 5). There are various ways to improve learning accuracy (Chapter 6, 7, 8).

About "identification" function (Chapter 2 and 3)

As mentioned above, "identification" means input value and known neural network function to <font color = "Red". > Find the output value . As an example, the flow of reading an image whose number is unknown and recognizing it as 5 is like this (see also the figure below).

  1. I don't know what the number is, but I cut the image to be read into a 20x20 mesh ( input value ).
  2. Known neural network function (so-called trained data) that makes a judgment such as "If the mesh here is painted black, the number XX is likely." Is used. (How the function is defined will be described later in "Learning")
  3. Calculate the probability that the loaded image is a number 0-9 ( output value ).
Identification processing flow |:-:|

What is important here is that this image is not determined as "5!", But is judged as "the probability of being 5 is 70%, so it seems to be 5.", and the output value Is to be a 1-by-10 matrix. In addition, we have devised so that "each element becomes a positive value" and "the total of each element becomes" 1 "" in the matrix Y output using the __activation function __, which is representative. There is sigmoid function and softmax function as a typical function.

About "learning" function (Chapter 4 and 5)

As mentioned above, "learning" means input value from test data ( input value ) for which the correct answer ( output value ) is known. Find the font color = "Green"> neural network function . This neural network function W is calculated using __loss function __, __ gradient descent __, and __ error backpropagation __.

learning |:-:|

__ Derailment: Nostalgic High School Mathematics __ In the function y = f (x), if you know the input value x and the output value y, you should know the function f (x).

f(x) = ax^2 + bx + c

 > For example, in the case of the quadratic function f (x), if there are three combinations of (x, y), a, b, c can be obtained, and f (x) can be obtained.
 It's a little difficult and you can think of the matrix version as "learning".
 (I miss high school mathematics: relaxed :).


 __ Loss function __
 About the figure of "Deep Learning flow (learning)" above

 --Using the neural network function A, the probability of the number "5" was 0.7.
 --Using the neural network function B, the probability of the number "5" was 0.85.

 In such a case, the loss function L is an index for evaluating which neural network function is __how better __.
 There are various loss function L such as entropy error and sum of squares error, but the formula of sum of squares error is this ↓.

```math
L = \sum (X_{Output value}-X_{Correct answer value})^2\\

Here, for example, when the probabilities (output values) of 2 to 5 when reading a numerical image are as follows, the loss functions L of the neural networks A and B are 0.34 and 0.14, respectively, and the neural network function B is excellent. I understand that.

損失関数

Also, as a matter of course, the closer the loss function L is to 0, the more accurately it can be identified, and it is important in "learning" to find the neural network function W. Then, reduce the value of the loss function L (__ gradient descent __) and recalculate the function W.

__ Gradient descent __ A method of finding the minimum value by finding f (x) at a location that is advanced by a certain distance in the gradient direction from the current location, and then advancing by a certain distance ΔX in the gradient direction from that point. ..

In the case of two dimensions (lower left figure), if you start from any point (x, f (x)) with X> maximum value, it will finally settle to the minimum value by the gradient method (the point where the gradient becomes 0 is the minimum). value). The same applies when the loss function L of the sum of squares error is three-dimensional (lower left figure), and the minimum value is reached by repeating the operation of finding the gradient at an arbitrary point and advancing it by ΔX in the gradient direction.

勾配降下法

Even if it is an n-dimensional matrix, the gradient toward the minimum value can be obtained by partial differentiation. Therefore, the matrix Y of the output value is shifted by the gradient direction Δx, and the function of the neural network from the original matrix X and the child matrix Y You can "learn" by recalculating W.

__ Backpropagation method __ Omitted. Well, a method that is easy to calculate with partial differentiation and matrix features suppressed.

Ingenuity to improve learning accuracy (Chapter 6, 7, 8)

Omitted. Damped vibration differential equations have also appeared, and the world beyond this is a deep world. .. ..

Summary

It's amazing that all the stage bosses I've encountered in my mathematical life have gathered and attacked. What does that mean?

--Partial differential equation (number IIIC) --Matrix product and transpose of n-dimensional matrix (university mathematics) --Damped vibration differential equation (institute test)

It was an on-parade that I had a hard time studying while taking university entrance exams or studying at university, so I had to remember these things lightly before learning about Deep Learning. So programming is strong, but for those who are a little involved in matrix calculations and partial differential equations, it seems that the learning cost is a little high.

Thank you for reading this far. : bow:

Recommended Posts

I'm not sure, but I feel like I understand Deep Learning (I tried Deep Learning from scratch)
[Deep Learning from scratch] I tried to explain Dropout
I tried to implement Perceptron Part 1 [Deep Learning from scratch]
[Deep Learning from scratch] I tried to implement sigmoid layer and Relu layer.
I tried deep learning
[Python] [Natural language processing] I tried Deep Learning ❷ made from scratch in Japanese ①
[Deep Learning from scratch] I implemented the Affine layer
Deep Learning from scratch 1-3 chapters
"Deep Learning from scratch" Self-study memo (No. 16) I tried to build SimpleConvNet with Keras
"Deep Learning from scratch" Self-study memo (No. 17) I tried to build DeepConvNet with Keras
[Deep Learning from scratch] I tried to explain the gradient confirmation in an easy-to-understand manner.
Deep learning from scratch (cost calculation)
I tried deep learning using Theano
Deep Learning memos made from scratch
I tried to implement deep learning that is not deep with only NumPy
Deep learning from scratch (forward propagation edition)
Deep learning / Deep learning from scratch 2-Try moving GRU
Deep learning / Deep learning made from scratch Chapter 6 Memo
[Learning memo] Deep Learning made from scratch [Chapter 5]
[Learning memo] Deep Learning made from scratch [Chapter 6]
"Deep Learning from scratch" in Haskell (unfinished)
Deep learning / Deep learning made from scratch Chapter 7 Memo
[Windows 10] "Deep Learning from scratch" environment construction
Learning record of reading "Deep Learning from scratch"
[Deep Learning from scratch] About hyperparameter optimization
"Deep Learning from scratch" Self-study memo (Part 12) Deep learning
[Learning memo] Deep Learning made from scratch [~ Chapter 4]
[Deep Learning from scratch] Speeding up neural networks I explained back propagation processing
I'm not an engineer at all, but I classified faces by machine learning
I tried to extract a line art from an image with Deep Learning
"Deep Learning from scratch" self-study memo (unreadable glossary)
"Deep Learning from scratch" Self-study memo (9) MultiLayerNet class
Deep Learning from scratch ① Chapter 6 "Techniques related to learning"
Good book "Deep Learning from scratch" on GitHub
Deep Learning from scratch Chapter 2 Perceptron (reading memo)
[Learning memo] Deep Learning from scratch ~ Implementation of Dropout ~
Python vs Ruby "Deep Learning from scratch" Summary
"Deep Learning from scratch" Self-study memo (10) MultiLayerNet class
"Deep Learning from scratch" Self-study memo (No. 11) CNN
I tried to predict horse racing by doing everything from data collection to deep learning
"Deep Learning from scratch" Self-study memo (No. 19) Data Augmentation
"Deep Learning from scratch 2" Self-study memo (No. 21) Chapters 3 and 4
Application of Deep Learning 2 made from scratch Spam filter
"Deep Learning from scratch" Self-study memo (Part 8) I drew the graph in Chapter 6 with matplotlib
An amateur stumbled in Deep Learning from scratch Note: Chapter 1
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 5
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 2
Create an environment for "Deep Learning from scratch" with Docker
I asked Deep Learning if recent Pokemon are like Digimon.
An amateur stumbled in Deep Learning from scratch Note: Chapter 3
I tried hosting a TensorFlow deep learning model using TensorFlow Serving
An amateur stumbled in Deep Learning from scratch Note: Chapter 7
An amateur stumbled in Deep Learning from scratch Note: Chapter 5
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 7
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 1
Making from scratch Deep Learning ❷ An amateur stumbled Note: Chapter 4
"Deep Learning from scratch" self-study memo (No. 18) One! Meow! Grad-CAM!
"Deep Learning from scratch" self-study memo (No. 19-2) Data Augmentation continued
An amateur stumbled in Deep Learning from scratch Note: Chapter 4
An amateur stumbled in Deep Learning from scratch Note: Chapter 2
I tried to divide with a deep learning language model