Last time I happened to get good results in the process of creating the article, so I will introduce it.
The learning result when the learning data of the mini-batch was randomized was better than the last time, so if I was investigating the cause, the learning result at the fixed time vs. at the random time It turned out that I intended to compare the learning results at the time of fixed vs. fixed + at random. (Last time article is a comparison between fixed time and random time)
We will check the difference between the case where the training data of the mini-batch is randomly shuffled for each general epoch and the case where it is a hybrid of fixed and random shuffle. As usual, it is the sin function that is trained.
[training data]
This discovery is due to forgetting to reset the training result of the model before training at random time. I didn't know how to reset it, and it was troublesome to change the model name etc., so I coded the same process twice. (Someone please tell me how to reset ...)
Create model twice
#Modeling
model = MyChain(n_units)
optimizer = optimizers.Adam()
optimizer.setup(model)
'''
abridgement(General random learning only)
'''
#Modeling(I don't know how to reset the training result of the model, so create the model again)
model = MyChain(n_units)
optimizer = optimizers.Adam()
optimizer.setup(model)
All parameters are appropriate.
The epoch at the time of hybrid is fixed and random 250 times each.
It's a dirty scribbled code, but it works, so I'm okay.
The entire
# -*- coding: utf-8 -*-
#Import from one end for the time being
import numpy as np
import chainer
from chainer import cuda, Function, gradient_check, Variable, optimizers, serializers, utils
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L
import time
from matplotlib import pyplot as plt
#data
def get_dataset(N):
x = np.linspace(0, 2 * np.pi, N)
y = np.sin(x)
return x, y
#neural network
class MyChain(Chain):
def __init__(self, n_units=10):
super(MyChain, self).__init__(
l1=L.Linear(1, n_units),
l2=L.Linear(n_units, n_units),
l3=L.Linear(n_units, 1))
def __call__(self, x_data, y_data):
x = Variable(x_data.astype(np.float32).reshape(len(x_data),1)) #Convert to Variable object
y = Variable(y_data.astype(np.float32).reshape(len(y_data),1)) #Convert to Variable object
return F.mean_squared_error(self.predict(x), y)
def predict(self, x):
h1 = F.relu(self.l1(x))
h2 = F.relu(self.l2(h1))
h3 = self.l3(h2)
return h3
def get_predata(self, x):
return self.predict(Variable(x.astype(np.float32).reshape(len(x),1))).data
# main
if __name__ == "__main__":
#Training data
N = 1000
x_train, y_train = get_dataset(N)
#Learning parameters
batchsize = 10
n_epoch = 500
n_units = 100
#Modeling
model = MyChain(n_units)
optimizer = optimizers.Adam()
optimizer.setup(model)
#Learning loop(General random only)
print "start..."
normal_losses =[]
start_time = time.time()
for epoch in range(1, n_epoch + 1):
# training
perm = np.random.permutation(N)
sum_loss = 0
for i in range(0, N, batchsize):
x_batch = x_train[perm[i:i + batchsize]]
y_batch = y_train[perm[i:i + batchsize]]
model.zerograds()
loss = model(x_batch,y_batch)
sum_loss += loss.data * batchsize
loss.backward()
optimizer.update()
average_loss = sum_loss / N
normal_losses.append(average_loss)
#Output learning process
if epoch % 10 == 0:
print "(normal) epoch: {}/{} normal loss: {}".format(epoch, n_epoch, average_loss)
interval = int(time.time() - start_time)
print "Execution time(normal): {}sec".format(interval)
#Modeling(I don't know how to reset the training result of the model, so create the model again)
model = MyChain(n_units)
optimizer = optimizers.Adam()
optimizer.setup(model)
#Learning loop(hybrid)
#Mini-batch training data fixed and randomly hybrid
hybrid_losses =[]
for order in ["fixed", "random"]:
start_time = time.time()
for epoch in range(1, (n_epoch + 1) / 2):
# training
perm = np.random.permutation(N)
sum_loss = 0
for i in range(0, N, batchsize):
if order == "fixed": #The order of learning is fixed
x_batch = x_train[i:i + batchsize]
y_batch = y_train[i:i + batchsize]
elif order == "random": #Random order of learning
x_batch = x_train[perm[i:i + batchsize]]
y_batch = y_train[perm[i:i + batchsize]]
model.zerograds()
loss = model(x_batch,y_batch)
sum_loss += loss.data * batchsize
loss.backward()
optimizer.update()
average_loss = sum_loss / N
hybrid_losses.append(average_loss)
#Output learning process
if epoch % 10 == 0:
print "(hybrid) epoch: {}/{} {} loss: {}".format(epoch, n_epoch, order, average_loss)
interval = int(time.time() - start_time)
print "Execution time(hybrid {}): {}sec".format(order, interval)
print "end"
#Error graphing
plt.plot(normal_losses, label = "normal_loss")
plt.plot(hybrid_losses, label = "hybrid_loss")
plt.yscale('log')
plt.legend()
plt.grid(True)
plt.title("loss")
plt.xlabel("epoch")
plt.ylabel("loss")
plt.show()
Compared to the general method (normal), the fixed and random hybrid method (hybrid) has an error that is an order of magnitude better. Switching from fixed to random is where hybrid_loss sharply decreases at the center of the horizontal axis.
If the number of epochs is the same, it seems that it is better to have less fixed and more random.
I do not know the academic cause, but in this learning target (sin function from 0 to 2π), if the learning data of the mini-batch is a hybrid of fixed and random, the error is an order of magnitude better than the general random-only method. became.
I thought this was rumored overfitting, so I tested it, but the result was the same as when I was learning.
I felt that the accumulation of detailed ideas like this one led to the creation of a highly accurate neural network.
Recommended Posts