Introduction

In the previous "Introduction to Deep Learning (1) -Understanding and using Chainer-", I summarized how to use Chainer. .. If you follow the reference, you will find out how to use it. When studying machine learning, I assume a simple regression problem and decide for myself that I can understand and use it when I can build a prediction model for it.

Therefore, this time, specifically, the data of the non-linear sin function is generated, and the non-linear regression is performed by the model constructed by Chainer. If you finish all the steps up to this point, you may be able to understand it even if you proceed to the advanced version such as image discrimination.

Development environment

・ OS: Mac OS X EL Capitan (10.11.5) · Python 2.7.12: Anaconda 4.1.1 (x86_64) ・ Chainer 1.12.0

This goal

As shown in the image below, we will build a nonlinear regression model that can capture the sin function perfectly.

Build a nonlinear regression model

The big picture of the program

`MyChain.py`


# -*- coding: utf-8 -*-
from chainer import Chain
import chainer.links as L
import chainer.functions as F

class MyChain(Chain):

    def __init__(self):
        super(MyChain, self).__init__(
            l1 = L.Linear(1, 100),
            l2 = L.Linear(100, 30),
            l3 = L.Linear(30, 1)
        )

    def predict(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        return self.l3(h2)

`example.py`


# -*- coding: utf-8 -*-

#Numerical calculation related
import math
import random
import numpy as np
import matplotlib.pyplot as plt
# chainer
from chainer import Chain, Variable
import chainer.functions as F
import chainer.links as L
from chainer import optimizers
from MyChain import MyChain

#Fixed random number seed
random.seed(1)

#Generation of sample data
#Sine function as a true function
x, y = [], []
for i in np.linspace(-3,3,100):
    x.append([i])
    y.append([math.sin(i)])  #True function
#Declare again as a chainer variable
x = Variable(np.array(x, dtype=np.float32))
y = Variable(np.array(y, dtype=np.float32))

#Declare NN model
model = MyChain()

#Loss function calculation
#Squared error in loss function(MSE)use
def forward(x, y, model):
    t = model.predict(x)
    loss = F.mean_squared_error(t, y)
    return loss

#chainer optimizer
#Adam is used for the optimization algorithm
optimizer = optimizers.Adam()
#Pass model parameters to optimizer
optimizer.setup(model)

#Repeat learning of parameters
for i in range(0,1000):
    loss = forward(x, y, model)
    print(loss.data)  #Show current MSE
    optimizer.update(forward, x, y, model)

#plot
t = model.predict(x)
plt.plot(x.data, y.data)
plt.scatter(x.data, t.data)
plt.grid(which='major',color='gray',linestyle='-')
plt.ylim(-1.5, 1.5)
plt.xlim(-4, 4)
plt.show()

Generation of sample data

Generate teacher data to build this nonlinear regression model. This time, we will use the sin function with 1 input and 1 output.

#Generation of sample data
#Sine function as a true function
x, y = [], []
for i in np.linspace(-3,3,100):
    x.append([i])
    y.append([math.sin(i)])  #True function
#Declare again as a chainer variable
x = Variable(np.array(x, dtype=np.float32))
y = Variable(np.array(y, dtype=np.float32))

Model definition

Build a model of Deep Learning with Chainer. This time, we made a four-layer structure consisting of an input layer, a hidden layer 1, a hidden layer 2, and an output layer. I decided the number of nodes appropriately (I wrote previous around here, but it is experience and intuition). If you are interested, you can edit the value here. The reason why there are two hidden layers is that when I went back to one layer, the characteristics were not captured well, so I increased it even more.

`MyChain.py`


# -*- coding: utf-8 -*-
from chainer import Chain
import chainer.links as L
import chainer.functions as F

class MyChain(Chain):

    def __init__(self):
        super(MyChain, self).__init__(
            l1 = L.Linear(1, 100),
            l2 = L.Linear(100, 30),
            l3 = L.Linear(30, 1)
        )

    def predict(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        return self.l3(h2)

The point is that you are using relu for the activation function. Not long ago, it was standard to use the sigmoid function for this activation function, but recently, when training parameters by the error back propagation method, the learning rate decreases as it goes to the back. It seems that relu is often used to avoid. I know this area only by feeling, so I need to study a little more. There are various other commentary articles on the activation function, so please check them out. Reference: [Machine learning] I will explain while trying the deep learning framework Chainer.

Confirm in unlearned situation

I tried to see what happened in a situation where I didn't learn at all. I think that you will be able to grasp the overall feeling by looking not only at the final result but also at the progress.

##Declare NN model
model = MyChain()

#plot
t = model.predict(x)
plt.plot(x.data, y.data)
plt.scatter(x.data, t.data)
plt.grid(which='major',color='gray',linestyle='-')
plt.ylim(-1.5, 1.5)
plt.xlim(-4, 4)
plt.show()

In the unlearned situation, you can see that the characteristics of the true function are not captured at all.

Learn parameters

To train the parameters, first define the loss function. This time we will use the mean squared error (MSE) as the loss function.

{\rm MSE} = \dfrac{1}{N} \sum_{n=1}^{N} \left( \hat{y}_{n} - y_{n} \right)^{2}

$ N $: Number of samples, $ y_ {n} $: $ n $ th output variable, $ \ hat {y} _ {n} $: $ n $ th output variable estimate In the example of mnist, the cross entropy function is used.

#Loss function calculation
#Squared error in loss function(MSE)use
def forward(x, y, model):
    t = model.predict(x)
    loss = F.mean_squared_error(t, y)
    return loss

By defining this loss function, Chainer can automatically calculate the gradient of the optimizer.

#chainer optimizer
#Adam is used for the optimization algorithm
optimizer = optimizers.Adam()
#Pass model parameters to optimizer
optimizer.setup(model)
#Gradient update
optimizer.update(forward, x, y, model)

This is the end of the basic flow.

Repeat learning

By repeating the above ```optimizer.update () several times, it will converge to a good parameter. This time, the teacher data is trained many times with the same data, but originally, as batch data, some samples are taken out from the sample population, they are trained as teacher data, and another sample is batched in the next cycle. The flow is to use it as data.

#Repeat learning of parameters
for i in range(0,1000):
    loss = forward(x, y, model)
    print(loss.data)  #Show current MSE
    optimizer.update(forward, x, y, model)

スクリーンショット 2016-08-08 16.16.36.png

You can see that the square error becomes smaller as the learning is repeated. After learning, we were able to approximate the function very smoothly.

reference

1. Official Chainer Reference There were various things written in Japanese, but I often encountered parts that could not be dealt with due to version changes, etc., so this was the most stable in English.
Introduction to Deep Learning (1) --Understanding and using Chainer-

bonus

We are waiting for you to follow us! Qiita: Carat Yoshizaki twitter：@carat_yoshizaki Hatena Blog: Carat COO Blog Home page: Carat

Tutor service "Kikagaku" where you can learn machine learning one-on-one Please feel free to contact us if you are interested in "Kikagaku" where you can learn "Mathematics-> Programming-> Web Applications" at once.

Introduction to Deep Learning (2) --Try your own nonlinear regression with Chainer-

Introduction

Development environment

This goal

Build a nonlinear regression model

The big picture of the program

MyChain.py

example.py

Generation of sample data

Model definition

MyChain.py

Confirm in unlearned situation

Learn parameters

Repeat learning

reference

bonus

`MyChain.py`

`example.py`

`MyChain.py`