Introduction

Last time I happened to get good results in the process of creating the article, so I will introduce it.

The learning result when the learning data of the mini-batch was randomized was better than the last time, so if I was investigating the cause, the learning result at the fixed time vs. at the random time It turned out that I intended to compare the learning results at the time of fixed vs. fixed + at random. (Last time article is a comparison between fixed time and random time)

environment

python: 2.7.6
chainer: 1.8.0

content of study

We will check the difference between the case where the training data of the mini-batch is randomly shuffled for each general epoch and the case where it is a hybrid of fixed and random shuffle. As usual, it is the sin function that is trained.

[training data]

input: theta (0 ~ 2π, 1000 divisions)
output: sin(theta)

Implementation

Create model twice

This discovery is due to forgetting to reset the training result of the model before training at random time. I didn't know how to reset it, and it was troublesome to change the model name etc., so I coded the same process twice. (Someone please tell me how to reset ...)

`Create model twice`


#Modeling
model = MyChain(n_units)
optimizer = optimizers.Adam()
optimizer.setup(model)

'''
abridgement(General random learning only)
'''

#Modeling(I don't know how to reset the training result of the model, so create the model again)
model = MyChain(n_units)
optimizer = optimizers.Adam()
optimizer.setup(model)

Learning parameters

Mini batch size: 20
Epoch (n_epoch): 500
Number of hidden layers: 2
Number of hidden layer units (n_units): 100
Activation function: Rectifier (relu)
Dropout: None (0%)
Optimization: Adam
Loss Error Function: Mean Squared Error Function (mean_squared_error)

All parameters are appropriate.

The epoch at the time of hybrid is fixed and random 250 times each.

Whole code

It's a dirty scribbled code, but it works, so I'm okay.

`The entire`


# -*- coding: utf-8 -*-

#Import from one end for the time being
import numpy as np
import chainer
from chainer import cuda, Function, gradient_check, Variable, optimizers, serializers, utils
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L
import time
from matplotlib import pyplot as plt

#data
def get_dataset(N):
    x = np.linspace(0, 2 * np.pi, N)
    y = np.sin(x)
    return x, y

#neural network
class MyChain(Chain):
    def __init__(self, n_units=10):
        super(MyChain, self).__init__(
             l1=L.Linear(1, n_units),
             l2=L.Linear(n_units, n_units),
             l3=L.Linear(n_units, 1))

    def __call__(self, x_data, y_data):
        x = Variable(x_data.astype(np.float32).reshape(len(x_data),1)) #Convert to Variable object
        y = Variable(y_data.astype(np.float32).reshape(len(y_data),1)) #Convert to Variable object
        return F.mean_squared_error(self.predict(x), y)

    def  predict(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        h3 = self.l3(h2)
        return h3

    def get_predata(self, x):
        return self.predict(Variable(x.astype(np.float32).reshape(len(x),1))).data

# main
if __name__ == "__main__":

    #Training data
    N = 1000
    x_train, y_train = get_dataset(N)

    #Learning parameters
    batchsize = 10
    n_epoch = 500
    n_units = 100

    #Modeling
    model = MyChain(n_units)
    optimizer = optimizers.Adam()
    optimizer.setup(model)

    #Learning loop(General random only)
    print "start..."
    normal_losses =[]
    start_time = time.time()
    for epoch in range(1, n_epoch + 1):

        # training
        perm = np.random.permutation(N)
        sum_loss = 0
        for i in range(0, N, batchsize):
            x_batch = x_train[perm[i:i + batchsize]]
            y_batch = y_train[perm[i:i + batchsize]]
            model.zerograds()
            loss = model(x_batch,y_batch)
            sum_loss += loss.data * batchsize
            loss.backward()
            optimizer.update()

        average_loss = sum_loss / N
        normal_losses.append(average_loss)

        #Output learning process
        if epoch % 10 == 0:
            print "(normal) epoch: {}/{} normal loss: {}".format(epoch, n_epoch, average_loss)

    interval = int(time.time() - start_time)
    print "Execution time(normal): {}sec".format(interval)

    #Modeling(I don't know how to reset the training result of the model, so create the model again)
    model = MyChain(n_units)
    optimizer = optimizers.Adam()
    optimizer.setup(model)

    #Learning loop(hybrid)
    #Mini-batch training data fixed and randomly hybrid
    hybrid_losses =[]
    for order in ["fixed", "random"]:
        start_time = time.time()
        for epoch in range(1, (n_epoch + 1) / 2):

            # training
            perm = np.random.permutation(N)
            sum_loss = 0
            for i in range(0, N, batchsize):
                if order == "fixed": #The order of learning is fixed
                    x_batch = x_train[i:i + batchsize]
                    y_batch = y_train[i:i + batchsize]
                elif order == "random": #Random order of learning
                    x_batch = x_train[perm[i:i + batchsize]]
                    y_batch = y_train[perm[i:i + batchsize]]

                model.zerograds()
                loss = model(x_batch,y_batch)
                sum_loss += loss.data * batchsize
                loss.backward()
                optimizer.update()

            average_loss = sum_loss / N
            hybrid_losses.append(average_loss)

            #Output learning process
            if epoch % 10 == 0:
                print "(hybrid) epoch: {}/{} {} loss: {}".format(epoch, n_epoch, order, average_loss)

        interval = int(time.time() - start_time)
        print "Execution time(hybrid {}): {}sec".format(order, interval)

    print "end"

    #Error graphing
    plt.plot(normal_losses, label = "normal_loss")
    plt.plot(hybrid_losses, label = "hybrid_loss")
    plt.yscale('log')
    plt.legend()
    plt.grid(True)
    plt.title("loss")
    plt.xlabel("epoch")
    plt.ylabel("loss")
    plt.show()

Execution result

error

Compared to the general method (normal), the fixed and random hybrid method (hybrid) has an error that is an order of magnitude better. Switching from fixed to random is where hybrid_loss sharply decreases at the center of the horizontal axis.

If the number of epochs is the same, it seems that it is better to have less fixed and more random.

Summary

I do not know the academic cause, but in this learning target (sin function from 0 to 2π), if the learning data of the mini-batch is a hybrid of fixed and random, the error is an order of magnitude better than the general random-only method. became.

I thought this was rumored overfitting, so I tested it, but the result was the same as when I was learning.

I felt that the accumulation of detailed ideas like this one led to the creation of a highly accurate neural network.

The result was better when the training data of the mini-batch was made a hybrid of fixed and random with a neural network.

Introduction

environment

content of study

Implementation

Create model twice

Create model twice

Learning parameters

Whole code

The entire

Execution result

error

Summary

`Create model twice`

`The entire`