Start studying: Saturday, December 7th
Teaching materials, etc .: ・ Miyuki Oshige "Details! Python3 Introductory Note ”(Sotec, 2017): 12/7 (Sat) -12/19 (Thu) read ・ Progate Python course (5 courses in total): 12/19 (Thursday) -12/21 (Saturday) end ・ Andreas C. Müller, Sarah Guido "(Japanese title) Machine learning starting with Python" (O'Reilly Japan, 2017): 12/21 (Sat) -December 23 (Sat) ・ Kaggle: Real or Not? NLP with Disaster Tweets: Posted on Saturday, December 28th to Friday, January 3rd Adjustment ・ Wes Mckinney "(Japanese title) Introduction to data analysis by Python" (O'Reilly Japan, 2018): 1/4 (Wednesday) to 1/13 (Monday) read ・ ** Yasuki Saito "Deep Learning from Zero" (O'Reilly Japan, 2016): 1/15 (Wed) ~ **
p.164 Chapter 5 Finished reading up to the error back propagation method.
・ The error back propagation method is a method for efficiently calculating the gradient of the weight parameter, which is the importance of the element. Gradient calculation by numerical differentiation performed in Chapter 4 is simple but takes time, but this can be calculated at high speed. (However, there are complicated parts.) In this book, explanations are given using "calculation graphs".
-Forward propagation: Proceeds the calculation from left to right. Backward propagation: Proceeds the calculation from right to left.
-In the calculation graph, you can output the next result by considering only the local calculation, that is, the small range related to you. Local calculations are simple, but propagating the results yields the results of complex calculations that make up the whole.
-In forward propagation, the local calculation result is propagated from left to right as shown in the arrow diagram, while in back propagation, the result of "local differentiation" is propagated from right to left. The result of this calculation is finally propagated to the first element (price or quantity), and ** indicates the magnitude of the influence of the numerical value appearing here on the final price **.
-Chain rule: Properties of synthetic functions with respect to differentiation ** "When a function is represented by a composite function, the differential of that composite function can be represented by the product of the differentials of each of the functions that make up the composite function." "**"
\frac{\partial_z}{\partial_x} = \frac{\partial_z}{\partial_t}\frac{\partial_t}{\partial_x}
We will proceed with the back propagation of the derivative using this principle of chain rule. All you have to do is attach it to the back of the node's input signal in the same way. If the input signal continues from right to left with h and y, it will be as follows.
\frac{\partial_z}{\partial_t}\frac{\partial_t}{\partial_x}\frac{\partial_x}{\partial_h}\frac{\partial_h}{\partial_y}
-Addition node: Consider the formula z = x + y. Since both partial derivatives are constants, the backpropagation at the addition node sends the same numerical value to the next node (previous node).
-Multiplication node: Consider the formula z = x * y. The partial differential solution for x is y, the partial differential solution for y is x Therefore, the back propagation of the multiplication node multiplies the input signal by the ** inverted value ** and sends it to the next node (previous node). That is, the one where x flows is multiplied by y, and the one where y flows is multiplied by x and returned.
-By applying the activation function (ReLU, Sigmoid) to the above idea, it becomes the mechanism of the actual neural network.
-ReLU acts like a "switch" in the circuit. If the value does not meet the criteria, send "0" as the value when passing through the node. (That is, the propagation of the system stops there.)
・ Affine transformation: Calculation of matrix product by forward propagation of neural network The weighted sum of neurons can be expressed as ** Y = np.dot (X, W) + B **. When propagating this weighted sum, an affine transformation is performed.
-Use the softmax function or identity function in the final processing, the processing in the output layer. As we learned last time, ** classification uses the softmax function (+ cross entropy error), and regression uses the identity function (+ square sum error) **.
-Gradient check: To check the result of numerical differentiation and the result of the gradient obtained by the error back propagation method. ** Subtract the numerical value of the numerical differentiation from the value of error back propagation **. With the correct implementation, the error will be as small as possible, close to zero.
Recommended Posts