Neural network starting with Chainer

Chainer is a library for implementing neural networks developed by Preferred Networks. Its features are as follows (from the homepage).

Personally, I would like to mention one more thing, "easy to install". Many deep learning frameworks are troublesome to install, but Chainer has few dependent libraries and was easy to install ... but I started using Cython from 1.5.0 and it was a bit of a hassle. became. Please refer to the following for the installation method.

In addition, Chainer's notation is intuitive and simple as described above, so it can cover a wide range from simple networks to more complex, so-called deep learning areas. Other deep learning libraries are completely over-engineered if they are not deep, but it was difficult for simple libraries (such as PyBrain) to be deep, so I think this is also a big advantage.

This time, I will explain how to use such attractive Chainer, but in order to handle Chainer, knowledge about (relatively deep) neural networks is indispensable. As a result, it often happens that the knowledge on the neural network side is insufficient (I got it).

Therefore, I would like to briefly explain the mechanism of the neural network first, and then explain how to implement it with Chainer in a later stage.

How neural networks work

Constitution

The configuration of the neural network is as follows (as an aside, drawing lines between nodes is not a hassle every time).

image

propagation

Let's take a closer look at how the input from the input gets to the output. The figure below makes it easy to see how the input from the input is made to the first node of the hidden layer.

image

You can see that four inputs are transmitted. The input is not transmitted directly as it is, but is weighted. Neural networks mimic the composition of neurons in the brain, but think of them as being weakened or strengthened as inputs (stimuli) propagate. Expressed mathematically, if the input is $ x $, it will be weighted as $ a $ like $ ax $.

image

Now, we received the input ʻax`, but the node does not just pass this value it received to the next layer. It seems that there is a mechanism in the brain that the input does not propagate to the next layer unless it exceeds a certain threshold, and here too, it imitates it and converts the received input to the output to the next layer. Expressed mathematically, the function that converts the input to the output to the next layer is $ h $, and the output value can be expressed as $ h (ax) $. This function $ h $ is called the activation function.

In summary, there are two important factors for value propagation in neural networks:

In short, the neural network simply weights the input it receives and outputs it. Therefore, a single layer neural network is almost synonymous with linear regression or logistic regression.

With that in mind, it becomes clear what the manipulation of the number of nodes and the number of layers means.

When dealing with neural networks, I think that the number of nodes and the number of layers may be messed up appropriately, but it is also important to plot the data firmly and find the appropriate number of nodes and layers.

Learning

To train a neural network, we use a technique called backpropagation. The error is the difference between the value output from the neural network and the actual value. Backpropagation, as the name implies, is a method of propagating this error from behind (output layer = output layer) and adjusting the weight of each layer.

image

The details of Backpropagation are not discussed in detail here because there are various other explanations, but the following two points are important.

In addition, there are several methods for using the above training data to perform the above operation of "calculating the error and updating the weight".

The cycle of 1 epoch is to finish updating the used learning data. Usually, you will learn this epoch several times. However, it is not so good if it is simply repeated, so the training data is shuffled at each epoch, and in the case of a mini-batch, the acquisition position of the mini-batch is shifted or randomly sampled.

This epoch is an important unit in neural network learning, such as checking the progress of learning and readjusting parameters.

Implementation by Chainer

Here is a summary of the contents of the explanation of neural networks.

Now, let's look at the implementation in Chainer and the above points.

Constitution

In Chainer, the neural network consists of Chain (Function Set up to 1.4). The following is a definition of the 4-3-2 type neural network used in the explanation so far.

from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L


class MyChain(Chain):
    
    def __init__(self):
        super(MyChain, self).__init__(
            l1=L.Linear(4, 3),
            l2=L.Linear(3, 2)
        )
    
    def __call__(self, x):
        h = F.sigmoid(self.l1(x))
        o = self.l2(h)
        return o

I think it's best for those who wondered if the hidden layer wasn't even one layer. Please refer to the figure below for the above l1 and l2.

image

Considering the propagation between layers in this way, the mechanism is that there are two layers. In fact, L.Linear holds the weight for propagation and is responsible for applying this weight to the input.

propagation

Propagation processing is implemented in __call__ of Chain class as described above.

    def __call__(self, x):
        h = F.sigmoid(self.l1(x))
        o = self.l2(h)
        return o

Here, the input x is weighted (self.l1 (x) ), and the value via the sigmoid function, which is often used as an activation function, is passed to the next layer (h =). F.sigmoid (self.l1 (x)) ). The final output does not require any processing to pass to the next layer, so the activation function is not used (ʻo = self.l2 (h) `)

Learning

When training, you first need to calculate the error between the predicted value and the actual value. You can simply implement this as a function (commonly named lossfun in Chainer), but for classification problems it's easier to use Classifier.

from chainer.functions.loss.mean_squared_error import mean_squared_error

model = L.Classifier(MyChain(), lossfun=mean_squared_error)

Actually, Classifier is also Link, that is, a function with parameters, and calculates the error between the value output from MyChain and the teacher data in __call__ (Function for calculation is natural. It can be specified (mean_squared_error in the above).

In 1.5, the point that this Link can be connected is very big, and the reusability of the model is much higher. Even in the above, you can see that the model of the main body and the process of calculating the error using it can be written separately.

After calculating the error, optimize the model to minimize it (Backpropagation above). It is ʻoptimizer` that plays this role, and the learning part of MNIST example is as follows. It has become.

# Setup optimizer
optimizer = optimizers.Adam()
optimizer.setup(model)

...(Omission)...

# Learning loop
for epoch in six.moves.range(1, n_epoch + 1):
    print('epoch', epoch)

    # training
    perm = np.random.permutation(N)
    sum_accuracy = 0
    sum_loss = 0
    for i in six.moves.range(0, N, batchsize):
        x = chainer.Variable(xp.asarray(x_train[perm[i:i + batchsize]]))
        t = chainer.Variable(xp.asarray(y_train[perm[i:i + batchsize]]))

        # Pass the loss function (Classifier defines it) and its arguments
        optimizer.update(model, x, t)

There are three basic steps:

At the core is the updating ʻoptimizer.update. From 1.5, by passing lossfun as an argument, error calculation and propagation by the passed lossfun will be performed automatically. Of course, it is also possible to initialize the gradient with model.zerograds () and then calculate and propagate the error by yourself (loss.backward) and call ʻoptimizer.update as before.

As you can see, Chainer is designed so that once you have defined your model, you can easily optimize it (Define-and-Run).

And the trained model can be easily saved / restored by using Serializer (also ʻoptimizer` can be saved).

serializers.save_hdf5('my.model', model)
serializers.load_hdf5('my.model', model)

After that, here are some tips for actually implementing it.

Perhaps the first thing that gets stuck is mainly type errors. I don't know if Chainer starts with a mold and ends with a mold, but there is no doubt that it starts with a mold, so please be careful about this point and use it.

Recommended Posts

Neural network starting with Chainer
Neural network with Python (scikit-learn)
3. Normal distribution with neural network!
4. Circle parameters with neural network!
Simple neural network implementation using Chainer
Neural network with OpenCV 3 and Python 3
Simple classification model with neural network
[TensorFlow] [Keras] Neural network construction with Keras
Compose with a neural network! Run Magenta
Predict time series data with neural network
Implementation of "blurred" neural network using Chainer
Seq2Seq (1) with chainer
2. Mean and standard deviation with neural network!
[Chainer] Document classification by convolutional neural network
Parametric Neural Network
Experiment with various optimization algorithms with a neural network
Verification of Batch Normalization with multi-layer neural network
Implement Convolutional Neural Network
Use tensorboard with Chainer
Python starting with Windows 7
GRPC starting with Python
Implement Neural Network from 1
Convolutional neural network experience
Train MNIST data with a neural network in PyTorch
Bayesian optimization implementation of neural network hyperparameters (Chainer + GPyOpt)
Implement feedforward neural network in Chainer to classify documents
Tech Circle ML # 8 Chainer with Recurrent Neural Language Model
Create a web application that recognizes numbers with a neural network
Rank learning using neural network (Implementation of RankNet by Chainer)
Simulate neural activity with Brian2
Try to build a deep learning / neural network with scratch
Python sample to learn XOR with genetic algorithm with neural network
Try implementing RBM with chainer.
Image classification with self-made neural network by Keras and PyTorch
Reinforcement learning starting with Python
Learn elliptical orbits with Chainer
Use chainer with Jetson TK1
PySpark life starting with Docker
[Deep learning] Image classification with convolutional neural network [DW day 4]
Implemented Conditional GAN with chainer
Image caption generation with Chainer
Neural network implementation in python
Pytorch Neural Network (CNN) Tutorial 1.3.1.
Implemented SmoothGrad with Chainer v2
Deep Embedded Clustering with Chainer 2.0
A little stuck with chainer
Python starting with Hello world!
Neural network implementation (NumPy only)
TensorFlow Tutorial-Convolutional Neural Network (Translation)
Network programming with Python Scapy
Network performance measurement with iperf
[Perfume x STAR WARS] Style conversion with Chainer starting in 1 minute
[Text classification] I implemented Convolutional Neural Networks for Sentence Classification with Chainer
Measuring network one-way delay with python
Multilayer Perceptron with Chainer: Function Fitting
Implementation of a two-layer neural network 2
PRML Chapter 5 Neural Network Python Implementation
Try horse racing prediction with Chainer
What is a Convolutional Neural Network?
[Chainer] Learning XOR with multi-layer perceptron
I implemented a two-layer neural network