[Deep Learning from scratch] Initial value of neural network weight using sigmoid function

Introduction

This article is an easy-to-understand output of ** Deep Learning from scratch Chapter 7 Learning Techniques **. I was able to understand it myself in the humanities, so I hope you can read it comfortably. Also, I would be more than happy if you could refer to it when studying this book.

About the initial value of the neural network weight

Until now, the initial value of the weight of the neural network used the random method to generate random numbers, but that would widen the success of learning.

The initial value of the weight and the learning of the neural network are very closely related, and if the initial value is appropriate, the learning result will be good, and if the initial value is inappropriate, the learning result will be bad.

Therefore, this time, I would like to implement a method of setting an appropriate initial value of weight in a neural network using the sigmoid function.

Initial value of Xavier

The initial value of the weight that is most suitable for the neural network using the sigmoid function is the initial value of Xavier.

scale = np.sqrt(1.0 / all_size_list[idx - 1]) 
scale * np.random.randn(all_size_list[idx-1], all_size_list[idx])

The initial value of Xavier can be created by calculating 1 ÷ the number of nodes in the previous layer with a root and multiplying it by a random random number.

Below is a sample neural network that uses the initial values of He and Xavier.

#Initial value application of weight ・ Neural network that implements Weight decay
class MutiLayerNet:
    def __init__(self,input_size,hiden_size_list,output_size,
                activation='relu',weight_init_std='relu',weight_decay_lambda=0):#weight_decay_The larger the lambda, the stronger
        self.input_size = input_size#Number of neurons in the input layer
        self.output_size = output_size#Number of neurons in the output layer
        self.hiden_size_list = hiden_size_list#Number of neurons in each layer of the middle layer
        self.hiden_layer_num = len(hiden_size_list)#Number of layers in the middle layer
        self.weight_decay_lambda = weight_decay_lambda#Weight decay strength setting
        self.params = {}#Enter parameters
        
        #Weight initialization
        self.__init_weight(weight_init_std)
        
        #Layer creation
        activation_layer = {'sigmoid': Sigmoid,'relu': Relu}
        self.layers = OrderedDict()#Ordered dictionary to save layers
        for idx in range(1, self.hiden_layer_num+1):#Repeat for the number of intermediate layers
            self.layers['Affine' + str(idx)] = Affine(self.params['W' + str(idx)],
                                                      self.params['b' + str(idx)])
            self.layers['Activation_function' + str(idx)] = activation_layer[activation]()#Select Relu function layer

        idx = self.hiden_layer_num + 1#Create Affine layer before output layer
        self.layers['Affine' + str(idx)] = Affine(self.params['W' + str(idx)],
            self.params['b' + str(idx)])

        self.last_layer = SoftmaxWithLoss()#Layer from output layer to loss function
        
    def __init_weight(self, weight_init_std):#Method to initialize weight / bias
        all_size_list = [self.input_size] + self.hiden_size_list + [self.output_size]#Retains the number of neurons in all layers
        for idx in range(1, len(all_size_list)):
            scale = weight_init_std#Enter the number to be multiplied by the random weight
            if str(weight_init_std).lower() in ('relu', 'he'):#Create initial value of he when using relu function
                scale = np.sqrt(2.0 / all_size_list[idx - 1])  #Recommended initial value when using ReLU
            elif str(weight_init_std).lower() in ('sigmoid', 'xavier'):#When using the sigmoid function Create the initial value of xavier
                scale = np.sqrt(1.0 / all_size_list[idx - 1])  #Recommended initial value when using sigmoid

            self.params['W' + str(idx)] = scale * np.random.randn(all_size_list[idx-1], all_size_list[idx])#Weight initialization
            self.params['b' + str(idx)] = np.zeros(all_size_list[idx])#Bias initialization
            
    def predict(self, x):#Forward propagation processing of neural network
        for layer in self.layers.values():
            x = layer.forward(x)

        return x
    
    def loss(self, x, t):#Forward propagation processing from neural network to loss function + Weight decay processing
        y = self.predict(x)

        weight_decay = 0
        for idx in range(1, self.hiden_layer_num + 2):#In the Weight decay process, the squares of the weights of each layer are summed, and the following process is performed to sum.
            W = self.params['W' + str(idx)]
            weight_decay += 0.5 * self.weight_decay_lambda * np.sum(W ** 2)

        return self.last_layer.forward(y, t) + weight_decay

    def accuracy(self, x, t):#Calculate the correct answer rate
        y = self.predict(x)
        y = np.argmax(y, axis=1)
        if t.ndim != 1 : t = np.argmax(t, axis=1)

        accuracy = np.sum(y == t) / float(x.shape[0])
        return accuracy

    def numerical_gradient(self, x, t):#Numerical differentiation
        loss_W = lambda W: self.loss(x, t)

        grads = {}
        for idx in range(1, self.hidden_layer_num+2):
            grads['W' + str(idx)] = slopeing_grad(loss_W, self.params['W' + str(idx)])
            grads['b' + str(idx)] = slopeing_grad(loss_W, self.params['b' + str(idx)])

        return grads

    def gradient(self, x, t):#Error back propagation method
        # forward
        self.loss(x, t)

        # backward
        dout = 1
        dout = self.last_layer.backward(dout)

        layers = list(self.layers.values())
        layers.reverse()
        for layer in layers:
            dout = layer.backward(dout)

        #Gradient recovery
        grads = {}
        for idx in range(1, self.hiden_layer_num+2):#Also handles weight decay
            grads['W' + str(idx)] = self.layers['Affine' + str(idx)].dW + self.weight_decay_lambda * self.layers['Affine' + str(idx)].W
            grads['b' + str(idx)] = self.layers['Affine' + str(idx)].db

        return grads

Recommended Posts

[Deep Learning from scratch] Initial value of neural network weight using sigmoid function
[Deep Learning from scratch] Initial value of neural network weight when using Relu function
"Deep Learning from scratch" Self-study memo (No. 10-2) Initial value of weight
Python vs Ruby "Deep Learning from scratch" Chapter 3 Implementation of 3-layer neural network
Chapter 3 Neural Network Cut out only the good points of deep learning made from scratch
Lua version Deep Learning from scratch Part 6 [Neural network inference processing]
Learning record of reading "Deep Learning from scratch"
Python vs Ruby "Deep Learning from scratch" Chapter 4 Implementation of loss function
Deep learning / error back propagation of sigmoid function
[Learning memo] Deep Learning from scratch ~ Implementation of Dropout ~
Application of Deep Learning 2 made from scratch Spam filter
[Deep Learning] Execute SONY neural network console from CUI
Deep Learning from scratch 1-3 chapters
Rank learning using neural network (Implementation of RankNet by Chainer)
Try to build a deep learning / neural network with scratch
[Deep learning] Investigating how to use each function of the convolutional neural network [DW day 3]
[Deep Learning from scratch] Implement backpropagation processing in neural network by error back propagation method
Deep learning from scratch (cost calculation)
Deep Learning memos made from scratch
[Deep Learning from scratch] Main parameter update methods for neural networks
Write an impression of Deep Learning 3 framework edition made from scratch
"Deep Learning from scratch" self-study memo (No. 13) Try using Google Colaboratory
[Deep Learning from scratch] About the layers required to implement backpropagation processing in a neural network
Deep Learning from scratch 4.4.2 Gradient for neural networks The question about the numerical_gradient function has been solved.
Deep learning from scratch (forward propagation edition)
Implementation of 3-layer neural network (no learning)
Deep learning / Deep learning from scratch 2-Try moving GRU
Deep learning / Deep learning made from scratch Chapter 6 Memo
[Deep Learning from scratch] I tried to implement sigmoid layer and Relu layer.
[Learning memo] Deep Learning made from scratch [Chapter 5]
[Learning memo] Deep Learning made from scratch [Chapter 6]
"Deep Learning from scratch" in Haskell (unfinished)
Deep learning / Deep learning made from scratch Chapter 7 Memo
[Windows 10] "Deep Learning from scratch" environment construction
[Deep Learning from scratch] Layer implementation from softmax function to cross entropy error
[Deep Learning from scratch] About hyperparameter optimization
"Deep Learning from scratch" Self-study memo (Part 12) Deep learning
[Learning memo] Deep Learning made from scratch [~ Chapter 4]
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
[Deep Learning from scratch] Speeding up neural networks I explained back propagation processing
"Deep Learning from scratch" self-study memo (unreadable glossary)
[Python / Machine Learning] Why Deep Learning # 1 Perceptron Neural Network
"Deep Learning from scratch" Self-study memo (9) MultiLayerNet class
Reinforcement learning 10 Try using a trained neural network.
Deep Learning from scratch ① Chapter 6 "Techniques related to learning"
Good book "Deep Learning from scratch" on GitHub
Deep Learning from scratch Chapter 2 Perceptron (reading memo)
Python vs Ruby "Deep Learning from scratch" Summary
"Deep Learning from scratch" Self-study memo (10) MultiLayerNet class
"Deep Learning from scratch" Self-study memo (No. 11) CNN
Countermeasures for "Unable to get upper directory" error when using Deep Learning ② created from scratch with spyder of ANACONDA
Deep Learning from scratch 4.3.3 Draw a gradient vector of your own function based on the sample code of partial differential.
Python vs Ruby "Deep Learning from scratch" Chapter 1 Graph of sin and cos functions
[Deep Learning from scratch] I implemented the Affine layer
"Deep Learning from scratch" Self-study memo (No. 19) Data Augmentation
"Deep Learning from scratch 2" Self-study memo (No. 21) Chapters 3 and 4
Othello ~ From the tic-tac-toe of "Implementation Deep Learning" (4) [End]
Implementation of a convolutional neural network using only Numpy
[Deep Learning from scratch] I tried to explain Dropout
Collection and automation of erotic images using deep learning
Chapter 2 Implementation of Perceptron Cut out only the good points of deep learning made from scratch