Deep running 2 Tuning of deep learning

Aidemy 2020/10/1

Introduction

Hello, it is Yope! I am a liberal arts student, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you! This is the second post of Deep Running. Nice to meet you.

What to learn this time ・ About hyperparameters of deep learning

Deep learning hyperparameters

Types of hyperparameters for deep learning

・ Hyperparameters of add ・ __ Dropout rate : Dropout (rate =) - Number of hidden layer units __: Dense () ・ __Activation function __: Activation () ・ Hyperparameters of compile ・ __ Loss function : loss - Optimization function __: optimizer ・ __ Learning rate __: optimizers.SGD (lr =) ・ Fit hyperparameters ・ __Batch size __: batch_size ・ __ Number of epochs __: epochs

Number of hidden layers and prediction accuracy (dropout)

-Although the number of hidden layers and the number of units can be freely determined, care must be taken because there is a risk that learning will be delayed or overfitting will occur easily if the number is too large. -If __Dropout (rate = ratio of units to be deleted) __ mentioned above is specified as a hyperparameter, learning will be performed while deleting a fixed ratio of units (neurons). Dropouts allow learning that is independent of specific neurons, preventing overfitting and improving model accuracy.

Activation function

-Activation is a function applied for __ fully connected layer output (neuron firing) __. If the activation function is not specified, the data cannot be separated by a straight line and the data cannot be classified. -Conversely, if the activation function is specified, even a model that cannot be linearly separated can be classified without fail if it can be properly learned.

-For the activation function, __ "sigmoid function (sigmoid)" __ that outputs the input value in the range of "0 to 1", 0 is output if the input value is less than 0, and 0 is input if it is 0 or more. There is __ "ReLU function (relu)" __ that outputs the value as it is.

Loss function

-The function that shows the difference between the output data and the teacher data is called the __loss function (loss) __. -Loss functions used in machine learning include __ "(mean) squared error" __ and __ "cross entropy error" __. Details will be described later. -(Review) In deep learning, the weight of each layer is updated so as to minimize this loss function (error back propagation method).

Mean squared error

-A loss function that squares and averages the difference between each output data and teacher data. -Since the mean square error is suitable for evaluation of continuous values, __ mainly applied to regression models __.

Cross entropy error (categorical_crossentropy)

-A loss function that expresses the error between the teacher data of the correct answer label and the output data by 0 to 1 by using the fact that the output of the correct answer label is 1. (The closer to 0, the smaller the error) -Cross entropy error is __ mainly applied to the classification model (binary classification) __.

Optimization function

・ As mentioned above, the weights are updated and learned so that the loss function is minimized. At this time, how to update the weights such as __learning rate, number of epochs, and past weight updates. It is the optimizer that is used to determine whether to reflect or __. What is set in the optimizer is "optimizers" that set the learning rate described later.

Learning rate

-The learning rate (lr) is a hyperparameter that determines how much the weight of each layer is changed at one time. -If the learning rate is too low, the update will hardly proceed, and if it is too high, the value will be scattered and the update will be wasted, or in some cases it will not converge, so it is necessary to set it to an appropriate value.

Batch size

-Batch size is the number of data to be input to the __ model at one time __. If you pass multiple data at once, the weight is updated by taking the average value of the loss of each data and the gradient of the loss function, so the influence of the biased data can be reduced __ and other parallel calculations for each data. Since it can be done, __ calculation time can be shortened __. -On the other hand, if multiple data are passed, it is difficult to update large weights, and there is a possibility of generating local solution that is optimized for only some data. -For this reason, when there is a lot of irregular data, the batch size is often increased to reduce the influence of biased data, and when there is little, the batch size is often reduced to avoid local solutions. ・ Batch size is set to 1, that is, learning to input data one at a time is online learning, learning to set batch size to the total number of data is __batch learning (batch_size) __, and so on. Setting to is called mini-batch learning.

Iterative learning (number of epochs)

・ Generally, in deep learning, learning is repeated several times with the same training data to improve accuracy. The number of learnings at this time is called epochs. Even if the number of learnings is large, the accuracy will not increase after a certain amount, but if learning more than necessary, overfitting will occur, so it is necessary to set the epoch number to an appropriate value.

Summary

-For deep learning hyperparameters, __Dropout (rate =) __ indicating the dropout ratio, __Dense () __ indicating the number of hidden layer units, and __Activation indicating the activation function are set in add. There is () __ ·. -The number of units in the dropout and hidden layers is related to the occurrence of overfitting, and if the activation function is not set, the data classification itself cannot be performed, so it is necessary to set an appropriate value. -There are loss indicating the loss function and optimizer indicating the optimization function to be set in compile. -For the loss function, __ "cross entropy error (categorical_crossentropy)" __ is used in the classification model. The optimization function is related to how the weights are updated, and sets __optimizers.SGD (lr =) __ indicating the learning rate. -The learning rate is the size of the weight that can be changed at one time, and if this is not set appropriately, learning will be wasted or the progress will be slowed down. -There are batch_size indicating the batch size and epochs indicating the number of epochs to be set in fit. The batch size represents the number of data input to the model at one time, and the number of epochs represents the number of trainings. These values vary from model to model.

This time is over. Thank you for reading until the end.

Recommended Posts

Deep running 2 Tuning of deep learning
Deep learning 1 Practice of deep learning
Python: Deep Learning Tuning
Deep reinforcement learning 2 Implementation of reinforcement learning
Deep Learning
Othello-From the tic-tac-toe of "Implementation Deep Learning" (3)
Meaning of deep learning models and parameters
Deep Learning Memorandum
Start Deep learning
Try deep learning of genomics with Kipoi
Visualize the effects of deep learning / regularization
Sentiment analysis of tweets with deep learning
Python Deep Learning
Deep learning × Python
Learning record of reading "Deep Learning from scratch"
Othello-From the tic-tac-toe of "Implementation Deep Learning" (2)
The story of doing deep learning with TPU
Deep learning / error back propagation of sigmoid function
A memorandum of studying and implementing deep learning
[Learning memo] Deep Learning from scratch ~ Implementation of Dropout ~
Basic understanding of stereo depth estimation (Deep Learning)
Parallel learning of deep learning by Keras and Kubernetes
Implementation of Deep Learning model for image recognition
Deep learning learned by implementation (segmentation) ~ Implementation of SegNet ~
First Deep Learning ~ Struggle ~
Python: Deep Learning Practices
Deep learning / activation functions
Deep learning / cross entropy
First Deep Learning ~ Preparation ~
First Deep Learning ~ Solution ~
[AI] Deep Metric Learning
I tried deep learning
Deep learning large-scale technology
Deep learning / softmax function
Count the number of parameters in the deep learning model
Application of Deep Learning 2 made from scratch Spam filter
Techniques for understanding the basis of deep learning decisions
Othello ~ From the tic-tac-toe of "Implementation Deep Learning" (4) [End]
DNN (Deep Learning) Library: Comparison of chainer and TensorFlow (1)
Cats are already tired of loose fluffy deep learning
Collection and automation of erotic images using deep learning
DEEP PROBABILISTIC PROGRAMMING --- "Deep Learning + Bayes" Library --- Introduction of Edward
Deep Learning from scratch 1-3 chapters
Try deep learning with TensorFlow
Deep Learning Gaiden ~ GPU Programming ~
<Course> Deep Learning: Day2 CNN
Basics of Machine Learning (Notes)
Japanese translation of public teaching materials for Deep learning nanodegree
Examination of Forecasting Method Using Deep Learning and Wavelet Transform-Part 2-
Reinforcement learning 2 Installation of chainerrl
Deep learning image recognition 1 theory
Other applications of dictionary learning
Deep learning / LSTM scratch code
Rabbit Challenge Deep Learning 1Day
<Course> Deep Learning: Day1 NN
Deep Kernel Learning with Pyro
Try Deep Learning with FPGA
Deep learning for compound formation?
Supervised Learning 3 Hyperparameters and Tuning (2)
Introducing Udacity Deep Learning Nanodegree
Supervised learning 1 Basics of supervised learning (classification)