In the previous "Introduction to Deep Learning (1) -Understanding and using Chainer-", I summarized how to use Chainer. .. If you follow the reference, you will find out how to use it. When studying machine learning, I assume a simple regression problem and decide for myself that I can understand and use it when I can build a prediction model for it.
Therefore, this time, specifically, the data of the non-linear sin function is generated, and the non-linear regression is performed by the model constructed by Chainer. If you finish all the steps up to this point, you may be able to understand it even if you proceed to the advanced version such as image discrimination.
・ OS: Mac OS X EL Capitan (10.11.5) · Python 2.7.12: Anaconda 4.1.1 (x86_64) ・ Chainer 1.12.0
As shown in the image below, we will build a nonlinear regression model that can capture the sin function perfectly.
MyChain.py
# -*- coding: utf-8 -*-
from chainer import Chain
import chainer.links as L
import chainer.functions as F
class MyChain(Chain):
def __init__(self):
super(MyChain, self).__init__(
l1 = L.Linear(1, 100),
l2 = L.Linear(100, 30),
l3 = L.Linear(30, 1)
)
def predict(self, x):
h1 = F.relu(self.l1(x))
h2 = F.relu(self.l2(h1))
return self.l3(h2)
example.py
# -*- coding: utf-8 -*-
#Numerical calculation related
import math
import random
import numpy as np
import matplotlib.pyplot as plt
# chainer
from chainer import Chain, Variable
import chainer.functions as F
import chainer.links as L
from chainer import optimizers
from MyChain import MyChain
#Fixed random number seed
random.seed(1)
#Generation of sample data
#Sine function as a true function
x, y = [], []
for i in np.linspace(-3,3,100):
x.append([i])
y.append([math.sin(i)]) #True function
#Declare again as a chainer variable
x = Variable(np.array(x, dtype=np.float32))
y = Variable(np.array(y, dtype=np.float32))
#Declare NN model
model = MyChain()
#Loss function calculation
#Squared error in loss function(MSE)use
def forward(x, y, model):
t = model.predict(x)
loss = F.mean_squared_error(t, y)
return loss
#chainer optimizer
#Adam is used for the optimization algorithm
optimizer = optimizers.Adam()
#Pass model parameters to optimizer
optimizer.setup(model)
#Repeat learning of parameters
for i in range(0,1000):
loss = forward(x, y, model)
print(loss.data) #Show current MSE
optimizer.update(forward, x, y, model)
#plot
t = model.predict(x)
plt.plot(x.data, y.data)
plt.scatter(x.data, t.data)
plt.grid(which='major',color='gray',linestyle='-')
plt.ylim(-1.5, 1.5)
plt.xlim(-4, 4)
plt.show()
Generate teacher data to build this nonlinear regression model. This time, we will use the sin function with 1 input and 1 output.
#Generation of sample data
#Sine function as a true function
x, y = [], []
for i in np.linspace(-3,3,100):
x.append([i])
y.append([math.sin(i)]) #True function
#Declare again as a chainer variable
x = Variable(np.array(x, dtype=np.float32))
y = Variable(np.array(y, dtype=np.float32))
Build a model of Deep Learning with Chainer. This time, we made a four-layer structure consisting of an input layer, a hidden layer 1, a hidden layer 2, and an output layer. I decided the number of nodes appropriately (I wrote previous around here, but it is experience and intuition). If you are interested, you can edit the value here. The reason why there are two hidden layers is that when I went back to one layer, the characteristics were not captured well, so I increased it even more.
MyChain.py
# -*- coding: utf-8 -*-
from chainer import Chain
import chainer.links as L
import chainer.functions as F
class MyChain(Chain):
def __init__(self):
super(MyChain, self).__init__(
l1 = L.Linear(1, 100),
l2 = L.Linear(100, 30),
l3 = L.Linear(30, 1)
)
def predict(self, x):
h1 = F.relu(self.l1(x))
h2 = F.relu(self.l2(h1))
return self.l3(h2)
The point is that you are using relu for the activation function. Not long ago, it was standard to use the sigmoid function for this activation function, but recently, when training parameters by the error back propagation method, the learning rate decreases as it goes to the back. It seems that relu is often used to avoid. I know this area only by feeling, so I need to study a little more. There are various other commentary articles on the activation function, so please check them out. Reference: [Machine learning] I will explain while trying the deep learning framework Chainer.
I tried to see what happened in a situation where I didn't learn at all. I think that you will be able to grasp the overall feeling by looking not only at the final result but also at the progress.
##Declare NN model
model = MyChain()
#plot
t = model.predict(x)
plt.plot(x.data, y.data)
plt.scatter(x.data, t.data)
plt.grid(which='major',color='gray',linestyle='-')
plt.ylim(-1.5, 1.5)
plt.xlim(-4, 4)
plt.show()
In the unlearned situation, you can see that the characteristics of the true function are not captured at all.
To train the parameters, first define the loss function. This time we will use the mean squared error (MSE) as the loss function.
{\rm MSE} = \dfrac{1}{N} \sum_{n=1}^{N} \left( \hat{y}_{n} - y_{n} \right)^{2}
#Loss function calculation
#Squared error in loss function(MSE)use
def forward(x, y, model):
t = model.predict(x)
loss = F.mean_squared_error(t, y)
return loss
By defining this loss function, Chainer can automatically calculate the gradient of the optimizer.
#chainer optimizer
#Adam is used for the optimization algorithm
optimizer = optimizers.Adam()
#Pass model parameters to optimizer
optimizer.setup(model)
#Gradient update
optimizer.update(forward, x, y, model)
This is the end of the basic flow.
By repeating the above ```optimizer.update ()
several times, it will converge to a good parameter.
This time, the teacher data is trained many times with the same data, but originally, as batch data, some samples are taken out from the sample population, they are trained as teacher data, and another sample is batched in the next cycle. The flow is to use it as data.
#Repeat learning of parameters
for i in range(0,1000):
loss = forward(x, y, model)
print(loss.data) #Show current MSE
optimizer.update(forward, x, y, model)
You can see that the square error becomes smaller as the learning is repeated. After learning, we were able to approximate the function very smoothly.
We are waiting for you to follow us! Qiita: Carat Yoshizaki twitter:@carat_yoshizaki Hatena Blog: Carat COO Blog Home page: Carat
Tutor service "Kikagaku" where you can learn machine learning one-on-one Please feel free to contact us if you are interested in "Kikagaku" where you can learn "Mathematics-> Programming-> Web Applications" at once.
Recommended Posts