(python) Deep Learning Library Chainer Basics Basics

Hello! It's cool! !!

This article is for beginners to deep learning.

This time, I will explain the basic part of Chainer, a deep learning library for python. I will write about how to build a fully connected neural network, activation function, optimization function, etc.

The python environment uses python 3.6.7-64bit. Also, the library uses only chainer.

Also, the code shown in this article is just like building a deep learning code like this, so it is recommended that you try to build it from the beginning while referring to it.

Make a model of neural network (NN)

Below is the code to build NN using chainer.

import chainer
import chainer.functions as F
import chainer.links as L
from chainer import training

class MyChain(chainer.Chain):
    def __init__(self, n_input, n_node, n_output):
        #Initialize the weights with a Gaussian distribution and scale the standard deviation with scale
        w = chainer.initializers.HeNormal(scale=1.0)
        super(MyChain, self).__init__()
        #Build a 4-layer NN
        with self.init_scope():
            self.l1 = L.Linear(n_input, n_node, initialW=w)
            self.l2 = L.Linear(n_node, n_node, initialW=w)
            self.l3 = L.Linear(n_node, n_node, initialW=w)
            self.l4 = L.Linear(n_node, n_output, initialW=w)
            
    def __call__(self, x):
        #Use relu function for activation function
        h = F.relu(self.l1(x))
        h = F.relu(self.l2(h))
        h = F.relu(self.l3(h))
        return self.l4(h)


def create_model():
    #Build an NN with 10 dimensions of input, 200 nodes, and 10 dimensions of output
    model = L.Classifier(MyChain(10, 200, 10), lossfun=F.softmax_cross_entropy)

    #Adam is used as an optimization function, alpha(Learning rate)0.025, ε to 1e-Set to 3.
    optimizer = chainer.optimizers.Adam(alpha=0.025, eps=1e-3)
    optimizer.setup(model)

    return model, optimizer

The above is the code to create the NN model.

Activation function

The activation function is to increase the expressiveness of the NN model. In other words, you will be able to handle more complex recognition problems.

In addition to relu, activation functions include tanh, sigmoid, and swish. You can also see other activation functions from the chainer official reference link below (see the column called Activation functions). https://docs.chainer.org/en/stable/reference/functions.html

Loss function

The loss function is there to calculate the error. You typically optimize your NN to reduce losses. The loss function is also called the objective function.

I used softmax_cross_entropy for the loss function (loss_fun), but there are other loss functions in the Loss functions section of the link above.

Optimization function

The optimization function is a function that determines how to update the NN.

In addition to Adam, there are SGD, RMSprop, AdaGrad, etc. as optimizers. You can also see other activation functions from the chainer official reference link below. https://docs.chainer.org/en/stable/reference/optimizers.html (The parameters of the optimization function are greatly related to the accuracy of learning, so it is good to try various combinations to find the optimum value.)

Let NN learn

Below is the code to start learning. Copy and paste into the same .py file as the NN model build code above.

#Take train data and test data as arguments
def learn(train, test):
    #Property
    epoch = 8
    batchsize = 256
    
    #NN model creation
    model, optimizer = create_model()

    #Definition of iterator
    train_iter = chainer.iterators.SerialIterator(train, batchsize) #For learning
    test_iter = chainer.iterators.SerialIterator(test, batchsize, repeat=False) #For evaluation

    #Updater registration
    updater = training.StandardUpdater(train_iter, optimizer)

    #Trainer registration
    trainer = training.Trainer(updater, (epoch, 'epoch'))
            
    #Display and save learning status
    trainer.extend(extensions.LogReport()) #log
    trainer.extend(extensions.Evaluator(test_iter, model)) #Display of epoch number
    trainer.extend(extensions.PrintReport(['epoch', 'main/loss', 'validation/main/loss',
                                            'main/accuracy', 'validation/main/accuracy', 'elapsed_time'] )) #Display of calculation status
            
    #Start learning
    trainer.run()

    #Save
    #chainer.serializers.save_npz("result/Agent" + str(episode) + ".model", model)

The learn function defined here receives teacher data and test data as arguments. (Create teacher data and test data according to what you want to train.) Also, loop this function according to the number of episodes.

Finally, I will explain the number of epochs and batch size.

Number of epochs

The epoch number is a value that determines how many times the same teacher data is trained. Usually, there are few things that can be learned at one time, so let them learn several times. However, if you set it too large, overfitting will occur, so let's adjust it while trying various values.

Batch size

The batch size is a value that determines how many pieces are taken from the teacher data and trained. Normally, the larger the number of data, the larger the value. There is also a value called the number of iterations, but once the batch size and the number of iterations are determined, the other value is automatically determined.

Summary

The above is the contents of this time. I just explained it briefly as a whole, so if you want to know more details, please refer to other sites and papers.

I hope this article will be a good entry point for anyone looking to study deep learning with chainer.