Introduction

Deep learning beginners tried to learn sin function with chainer. As a beginner level, I felt a little understood after reading deep learning, but when I decided to write this article, I felt despair at my low level of understanding. Learning sin functions has been done by yuukiclass and many others, but it's not bad.

environment

python: 2.7.6
chainer: 1.8.0

content of study

Learn sin (theta) from angles (theta) from 0 to 2π

[training data]

input: theta (0 ~ 2π, 1000 divisions)
output: sin(theta)

Implementation

Mini batch learning

This is the implementation part of mini-batch learning. This code is familiar from MNIST samples (some changes such as range). This mini-batch learning seems to be popular.

`Excerpt from mini-batch learning`


perm = np.random.permutation(N)
sum_loss = 0
for i in range(0, N, batchsize):
    x_batch = x_train[perm[i:i + batchsize]]
    y_batch = y_train[perm[i:i + batchsize]]

    model.zerograds()
    loss = model(x_batch,y_batch)
    sum_loss += loss.data * batchsize
    loss.backward()
    optimizer.update()

Training data & test

The number of data is changed so that the angle used during the test is different from that during learning. The angle is changed by dividing 0 to 2π into 1,000 during learning and dividing 0 to 2π into 900 during testing.

`Training data&Excerpt from result confirmation`


#Training data
N = 1000
x_train, y_train = get_dataset(N)

#test data
N_test = 900
x_test, y_test = get_dataset(N_test)

'''
abridgement
'''
    # test
    loss = model(x_test,y_test)
    test_losses.append(loss.data)

Learning parameters

Mini batch size: 10
Epoch (n_epoch): 500
Number of hidden layers: 2
Number of hidden layer units (n_units): 100
Activation function: Rectifier (relu)
Dropout: None (0%)
Optimization: Adam
Loss Error Function: Mean Squared Error Function (mean_squared_error)

All parameters are appropriate.

Whole code

`The entire`


# -*- coding: utf-8 -*-

#Import from one end for the time being
import numpy as np
import chainer
from chainer import cuda, Function, gradient_check, Variable, optimizers, serializers, utils
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L
import time
from matplotlib import pyplot as plt

#data
def get_dataset(N):
    x = np.linspace(0, 2 * np.pi, N)
    y = np.sin(x)
    return x, y

#neural network
class MyChain(Chain):
    def __init__(self, n_units=10):
        super(MyChain, self).__init__(
             l1=L.Linear(1, n_units),
             l2=L.Linear(n_units, n_units),
             l3=L.Linear(n_units, 1))

    def __call__(self, x_data, y_data):
        x = Variable(x_data.astype(np.float32).reshape(len(x_data),1)) #Convert to Variable object
        y = Variable(y_data.astype(np.float32).reshape(len(y_data),1)) #Convert to Variable object
        return F.mean_squared_error(self.predict(x), y)

    def  predict(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        h3 = self.l3(h2)
        return h3

    def get_predata(self, x):
        return self.predict(Variable(x.astype(np.float32).reshape(len(x),1))).data

# main
if __name__ == "__main__":

    #Training data
    N = 1000
    x_train, y_train = get_dataset(N)

    #test data
    N_test = 900
    x_test, y_test = get_dataset(N_test)

    #Learning parameters
    batchsize = 10
    n_epoch = 500
    n_units = 100

    #Modeling
    model = MyChain(n_units)
    optimizer = optimizers.Adam()
    optimizer.setup(model)

    #Learning loop
    train_losses =[]
    test_losses =[]
    print "start..."
    start_time = time.time()
    for epoch in range(1, n_epoch + 1):

        # training
        perm = np.random.permutation(N)
        sum_loss = 0
        for i in range(0, N, batchsize):
            x_batch = x_train[perm[i:i + batchsize]]
            y_batch = y_train[perm[i:i + batchsize]]

            model.zerograds()
            loss = model(x_batch,y_batch)
            sum_loss += loss.data * batchsize
            loss.backward()
            optimizer.update()

        average_loss = sum_loss / N
        train_losses.append(average_loss)

        # test
        loss = model(x_test,y_test)
        test_losses.append(loss.data)

        #Output learning process
        if epoch % 10 == 0:
            print "epoch: {}/{} train loss: {} test loss: {}".format(epoch, n_epoch, average_loss, loss.data)

        #Graphing learning results
        if epoch in [10, 500]:
            theta = np.linspace(0, 2 * np.pi, N_test)
            sin = np.sin(theta)
            test = model.get_predata(theta)
            plt.plot(theta, sin, label = "sin")
            plt.plot(theta, test, label = "test")
            plt.legend()
            plt.grid(True)
            plt.xlim(0, 2 * np.pi)
            plt.ylim(-1.2, 1.2)
            plt.title("sin")
            plt.xlabel("theta")
            plt.ylabel("amp")
            plt.savefig("fig/fig_sin_epoch{}.png ".format(epoch)) #Assuming the fig folder exists
            plt.clf()

    print "end"

    interval = int(time.time() - start_time)
    print "Execution time: {}sec".format(interval)


    #Error graphing
    plt.plot(train_losses, label = "train_loss")
    plt.plot(test_losses, label = "test_loss")
    plt.yscale('log')
    plt.legend()
    plt.grid(True)
    plt.title("loss")
    plt.xlabel("epoch")
    plt.ylabel("loss")
    plt.savefig("fig/fig_loss.png ") #Assuming the fig folder exists
    plt.clf()

Execution result

error

The error tends to decrease as the epoch (number of learnings) increases. There was no significant difference between the learning and testing errors. I think that the error at the time of testing is slightly better than that at the time of learning because the method of calculating the error is different.

Learning results

When the epoch is 10, it is hard to say that it is a sin function, but when learning progresses to 500, it is quite close to the sin function.

epoch: 10

epoch: 500

Summary

For the time being, I was able to train the sin function with chainer.

However, for some reason, the larger the angle, the larger the error. I thought that if the order of the angles to be learned was randomized, the variation in the error for each angle could be suppressed, but it seems to be different. It is not well understood.

reference

I tried to approximate the sin function using chainer (re-challenge)

Chainer and deep learning learned by function approximation

Regression forward propagation neural network with chainer

I tried to learn the sin function with chainer