Python learning memo for machine learning by Chainer Chapter 13 Neural network training ~ Chainer completed

What This is an article that summarizes what I noticed and researched when learning machine learning using Chainer. This time, we will study neural network training.

It is written based on my understanding, so it may be incorrect. I will correct any mistakes, please forgive me.

Content

Neural network training

To put it simply, improve the accuracy of the model, and make it smarter for the user.

Objective function

If we dig deeper into the neural network, that is, we will optimize the objective function. The following two typical objective functions are introduced.

** Mean squared error often used in regression problems **
** Cross entropy ** often used in classification problems

Mean squared error is a method for finding the optimum solution for model parameters, whereas one solution is found at once. I understand that the method of predicting that this is more probabilistically possible is cross entropy.

Objective function optimization

Gradient descent method: As the name suggests, a method of updating parameters from the gradient Mini-batch learning method: Derivation of each objective function by making multiple sets of data sets. And how to update the parameters by taking the average value of the objective function (I'm not sure)

Activation function

If the value of the gradient of the activation function is small, the parameters of each layer will also be small. This is called gradient disappearance. Is there any restriction on the activation function? (Does not diverge, converges ...) Since it is output as a probability, should it be 1 or less? ?? ?? The ReLU function is introduced there, but who and how did you find it ...? I will update it when I know the details.

I wonder if deep learning has become possible by solving the problem of gradient disappearance.

Comment For the time being, I got an overview of machine learning. Next, I would like to make a concrete program.

That's why I bought this book. To be honest, I didn't know what to buy because I was too inexperienced, but when I looked at the index of the table of contents, ** Because it uses the library learned by Chainer, it is suitable for actual battle ** ** Introducing web application creation, which may help to make the program publicly available ** So I decided to buy it.

So study this book ** STEP.1 Machine learning battle STEP.2 Master Pyhton to the application release level **

I will do my best to the next goal.