In deep learning, it seems that it is common to randomly shuffle the learning data of the mini batch for each epoch, so I checked the effect.
For each epoch, check the difference between when the learning data of the mini-batch is fixed and when it is random. The training is the same sin function as in I tried learning the sin function with chainer.
[training data]
The learning data of the mini-batch is switched between fixed and random. The fixed code is the same technique that is often used when testing.
Fixed or random
perm = np.random.permutation(N)
sum_loss = 0
for i in range(0, N, batchsize):
if order == "fixed": #The order of learning is fixed
x_batch = x_train[i:i + batchsize]
y_batch = y_train[i:i + batchsize]
elif order == "random": #Random order of learning
x_batch = x_train[perm[i:i + batchsize]]
y_batch = y_train[perm[i:i + batchsize]]
All parameters are appropriate as in the example.
The entire
# -*- coding: utf-8 -*-
#Import from one end for the time being
import numpy as np
import chainer
from chainer import cuda, Function, gradient_check, Variable, optimizers, serializers, utils
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L
import time
from matplotlib import pyplot as plt
#data
def get_dataset(N):
x = np.linspace(0, 2 * np.pi, N)
y = np.sin(x)
return x, y
#neural network
class MyChain(Chain):
def __init__(self, n_units=10):
super(MyChain, self).__init__(
l1=L.Linear(1, n_units),
l2=L.Linear(n_units, n_units),
l3=L.Linear(n_units, 1))
def __call__(self, x_data, y_data):
x = Variable(x_data.astype(np.float32).reshape(len(x_data),1)) #Convert to Variable object
y = Variable(y_data.astype(np.float32).reshape(len(y_data),1)) #Convert to Variable object
return F.mean_squared_error(self.predict(x), y)
def predict(self, x):
h1 = F.relu(self.l1(x))
h2 = F.relu(self.l2(h1))
h3 = self.l3(h2)
return h3
def get_predata(self, x):
return self.predict(Variable(x.astype(np.float32).reshape(len(x),1))).data
# main
if __name__ == "__main__":
#Training data
N = 1000
x_train, y_train = get_dataset(N)
#Learning parameters
batchsize = 10
n_epoch = 500
n_units = 100
#Learning loop
fixed_losses =[]
random_losses =[]
print "start..."
for order in ["fixed", "random"]:
#Modeling
model = MyChain(n_units)
optimizer = optimizers.Adam()
optimizer.setup(model)
start_time = time.time()
for epoch in range(1, n_epoch + 1):
# training
perm = np.random.permutation(N)
sum_loss = 0
for i in range(0, N, batchsize):
if order == "fixed": #The order of learning is fixed
x_batch = x_train[i:i + batchsize]
y_batch = y_train[i:i + batchsize]
elif order == "random": #Random order of learning
x_batch = x_train[perm[i:i + batchsize]]
y_batch = y_train[perm[i:i + batchsize]]
model.zerograds()
loss = model(x_batch,y_batch)
sum_loss += loss.data * batchsize
loss.backward()
optimizer.update()
average_loss = sum_loss / N
if order == "fixed":
fixed_losses.append(average_loss)
elif order == "random":
random_losses.append(average_loss)
#Output learning process
if epoch % 10 == 0:
print "({}) epoch: {}/{} loss: {}".format(order, epoch, n_epoch, average_loss)
interval = int(time.time() - start_time)
print "Execution time({}): {}sec".format(order, interval)
print "end"
#Graphing the error
plt.plot(fixed_losses, label = "fixed_loss")
plt.plot(random_losses, label = "random_loss")
plt.yscale('log')
plt.legend()
plt.grid(True)
plt.title("loss")
plt.xlabel("epoch")
plt.ylabel("loss")
plt.show()
There was a difference of about 10 times in the error between fixed time and random time. It is better to make it random according to the previous reputation.
We also confirmed that the entire training data was shuffled at random. When the entire training data was shuffled at random, the result was comparable to that at random when fixed. Rather, the variation for each epoch is better when fixed.
Random entire data
def get_dataset(N):
x = 2 * np.pi * np.random.random(N)
y = np.sin(x)
return x, y
When training the sin function, the error is about 10 times better than when the training data of the mini-batch is randomly shuffled for each epoch. However, randomizing the entire training data gave good results in both cases.
It may be limited to specific conditions such as the sin function, but it is also worth trying a method of randomly shuffling the entire data and fixing the training data of the mini-batch.
Neural network starting with Chainer
Basics and Practice of Deep Learning Implementation
Recommended Posts