Confirmed the difference in the presence or absence of random processing during mini-batch learning with chainer

Introduction

In deep learning, it seems that it is common to randomly shuffle the learning data of the mini batch for each epoch, so I checked the effect.

environment

content of study

For each epoch, check the difference between when the learning data of the mini-batch is fixed and when it is random. The training is the same sin function as in I tried learning the sin function with chainer.

[training data]

Implementation

Fixed or random

The learning data of the mini-batch is switched between fixed and random. The fixed code is the same technique that is often used when testing.

Fixed or random


perm = np.random.permutation(N)
sum_loss = 0
for i in range(0, N, batchsize):
    if order == "fixed": #The order of learning is fixed
        x_batch = x_train[i:i + batchsize]
        y_batch = y_train[i:i + batchsize]
    elif order == "random": #Random order of learning
        x_batch = x_train[perm[i:i + batchsize]]
        y_batch = y_train[perm[i:i + batchsize]]

Learning parameters

All parameters are appropriate as in the example.

Whole code

The entire


# -*- coding: utf-8 -*-

#Import from one end for the time being
import numpy as np
import chainer
from chainer import cuda, Function, gradient_check, Variable, optimizers, serializers, utils
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L
import time
from matplotlib import pyplot as plt

#data
def get_dataset(N):
    x = np.linspace(0, 2 * np.pi, N)
    y = np.sin(x)
    return x, y

#neural network
class MyChain(Chain):
    def __init__(self, n_units=10):
        super(MyChain, self).__init__(
             l1=L.Linear(1, n_units),
             l2=L.Linear(n_units, n_units),
             l3=L.Linear(n_units, 1))

    def __call__(self, x_data, y_data):
        x = Variable(x_data.astype(np.float32).reshape(len(x_data),1)) #Convert to Variable object
        y = Variable(y_data.astype(np.float32).reshape(len(y_data),1)) #Convert to Variable object
        return F.mean_squared_error(self.predict(x), y)

    def  predict(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        h3 = self.l3(h2)
        return h3

    def get_predata(self, x):
        return self.predict(Variable(x.astype(np.float32).reshape(len(x),1))).data

# main
if __name__ == "__main__":

    #Training data
    N = 1000
    x_train, y_train = get_dataset(N)

    #Learning parameters
    batchsize = 10
    n_epoch = 500
    n_units = 100

    #Learning loop
    fixed_losses =[]
    random_losses =[]
    print "start..."
    for order in ["fixed", "random"]:
        #Modeling
        model = MyChain(n_units)
        optimizer = optimizers.Adam()
        optimizer.setup(model)

        start_time = time.time()
        for epoch in range(1, n_epoch + 1):

            # training
            perm = np.random.permutation(N)
            sum_loss = 0
            for i in range(0, N, batchsize):
                if order == "fixed": #The order of learning is fixed
                    x_batch = x_train[i:i + batchsize]
                    y_batch = y_train[i:i + batchsize]
                elif order == "random": #Random order of learning
                    x_batch = x_train[perm[i:i + batchsize]]
                    y_batch = y_train[perm[i:i + batchsize]]

                model.zerograds()
                loss = model(x_batch,y_batch)
                sum_loss += loss.data * batchsize
                loss.backward()
                optimizer.update()

            average_loss = sum_loss / N
            if order == "fixed":
                fixed_losses.append(average_loss)
            elif order == "random":
                random_losses.append(average_loss)

            #Output learning process
            if epoch % 10 == 0:
                print "({}) epoch: {}/{} loss: {}".format(order, epoch, n_epoch, average_loss)

        interval = int(time.time() - start_time)
        print "Execution time({}): {}sec".format(order, interval)

    print "end"

    #Graphing the error
    plt.plot(fixed_losses, label = "fixed_loss")
    plt.plot(random_losses, label = "random_loss")
    plt.yscale('log')
    plt.legend()
    plt.grid(True)
    plt.title("loss")
    plt.xlabel("epoch")
    plt.ylabel("loss")
    plt.show()

Execution result

error

There was a difference of about 10 times in the error between fixed time and random time. It is better to make it random according to the previous reputation.

sin_diff_order.png

If you shuffle the entire data

We also confirmed that the entire training data was shuffled at random. When the entire training data was shuffled at random, the result was comparable to that at random when fixed. Rather, the variation for each epoch is better when fixed.

Random entire data


def get_dataset(N):
    x = 2 * np.pi * np.random.random(N)
    y = np.sin(x)
    return x, y

sin_diff_order2.png

Summary

When training the sin function, the error is about 10 times better than when the training data of the mini-batch is randomly shuffled for each epoch. However, randomizing the entire training data gave good results in both cases.

It may be limited to specific conditions such as the sin function, but it is also worth trying a method of randomly shuffling the entire data and fixing the training data of the mini-batch.

reference

Neural network starting with Chainer

Basics and Practice of Deep Learning Implementation

Recommended Posts

Confirmed the difference in the presence or absence of random processing during mini-batch learning with chainer
Predict the presence or absence of infidelity by machine learning
I tried to predict the presence or absence of snow by machine learning.
rsync Behavior changes depending on the presence or absence of the slash in the copy source
Receive a list of the results of parallel processing in Python with starmap
How the reference of the python array changes depending on the presence or absence of subscripts
The story of doing deep learning with TPU
See the behavior of drunkenness with reinforcement learning
View the result of geometry processing in Python
About the behavior of Queue during parallel processing