There is already a person who implements Zundokokiyoshi using LSTM, but I think that you should be able to learn with just a simple RNN at this level. , I tried to implement it with Chainer to understand RNN.
According to 7.5 of the book "Deep Learning (Machine Learning Professional Series)" (ISBN-13: 978-4061529021), RNN remembers You can do it for the past 10 hours. You only have to remember the pattern in which "Zun" appears 4 times and "Doko" appears once, so it should be within this range.
I thought of a structure as simple as possible. Enter two ($ x_1, x_2
Dung → x_0 = 0, x_1 = 1 \\
Doco → x_0 = 1, x_1 = 0
The output is defined as follows: Let's say it's a simple classification problem.
Kiyoshi failed → y_0 = 1, y_1 = 0 \\
Kiyoshi established → y_0 = 0, y_1 = 1
The code definition is as follows.
import numpy as np
import chainer
from chainer import cuda, Function, gradient_check, Variable, optimizers, serializers, utils
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L
class RNN(chainer.Chain):
def __init__(self):
super(RNN, self).__init__(
w1 = L.Linear(2, 10),
h1 = L.Linear(10, 10),
o = L.Linear(10, 2)
)
def reset_state(self):
self.last_z = chainer.Variable(np.zeros((1,10), dtype=np.float32))
def __call__(self, x):
z = F.relu(self.w1(x) + self.h1(self.last_z))
self.last_z = z
y = F.relu(self.o(z))
return y
rnn = RNN()
rnn.reset_state()
model = L.Classifier(rnn)
optimizer = optimizers.Adam() #Use Adam
optimizer.setup(model)
optimizer.add_hook(chainer.optimizer.GradientClipping(10.0)) #Set the upper limit of the gradient
Assuming a mini-batch with a fixed size of 1, the initial value (last_z) is zero. The value of the hidden layer once calculated is kept in self.last_z, and the result is given to the output layer.
Generates moderately random series data and trains it.
ans_zundoko = [0, 0, 0, 0, 1] #Correct answer sequence
src_x_ary = [0, 0, 0, 1] #Make the appearance rate of the random number generation array 0 higher than 1
def zd_gen(): #generator
x_ary = [0, 0, 0, 0, 1]
y_ary = [0, 0, 0, 0, 1]
while True:
x = x_ary.pop(0)
y = y_ary.pop(0)
x = [0, 1] if x == 0 else [1, 0]
yield x, y
new_x = src_x_ary[np.random.randint(0, 4)] #0 if 0 to 2,If it is 3, it will be 1.
x_ary.append(new_x)
y_ary.append(1 if x_ary == ans_zundoko else 0) # x_ary is[0, 0, 0, 0, 1]Only when
bprop_len = 40 #BPPT censoring time
iter = 300 * 100 * 2 #Number of learning
loss = 0
i = 0
for xx, yy in zd_gen():
x = chainer.Variable(np.asarray([xx], dtype=np.float32))
t = chainer.Variable(np.asarray([yy], dtype=np.int32))
loss += model(x, t)
i += 1
if i % bprop_len == 0:
model.zerograds()
loss.backward()
loss.unchain_backward()
optimizer.update()
print("iter %d, loss %f, x %d, y %d" % (i, loss.data, xx[0], yy))
loss = 0
if i > iter:
break
Learning is time consuming and may or may not work depending on the initial values. It will not work properly unless the loss finally falls below 0.1.
Try changing the optimizer to SGD or changing bprop_len and the results will change. The value set here uses the case that somehow went well at hand.
Evaluate the trained model. You can generate the input column randomly, but for the sake of simplicity, I prepared static evaluation data.
#Dung Dung Dung Dung Doko Doko Dung
x_data = [[0,1], [0,1], [0,1], [0,1], [1,0], [1,0], [1,0], [0,1]]
rnn.reset_state()
for xx in x_data:
print('Dung' if xx[1] == 1 else 'Doco')
x = chainer.Variable(np.asarray([xx], dtype=np.float32))
y = model.predictor(x)
z = F.softmax(y, use_cudnn=False)
if z.data[0].argmax() == 1: #Kiyoshi is established when the array subscript with the larger value is 1.
print('Kiyoshi')
The output when it goes well is shown for reference.
iter 59520, loss 0.037670, x 1, y 0
iter 59560, loss 0.051628, x 0, y 0
iter 59600, loss 0.037519, x 0, y 0
iter 59640, loss 0.041894, x 0, y 0
iter 59680, loss 0.059143, x 0, y 0
iter 59720, loss 0.062305, x 0, y 0
iter 59760, loss 0.055293, x 0, y 0
iter 59800, loss 0.060964, x 1, y 1
iter 59840, loss 0.057446, x 1, y 0
iter 59880, loss 0.034730, x 1, y 0
iter 59920, loss 0.054435, x 0, y 0
iter 59960, loss 0.039648, x 0, y 0
iter 60000, loss 0.036578, x 0, y 0
Dung
Dung
Dung
Dung
Doco
Kiyoshi
Doco
Doco
Dung
I feel that I can finally understand the RNN that I haven't fully digested. At first, the number of units in the middle layer was small and the BPTT cutoff time was set too short, so it didn't work as expected, but after making various adjustments, it finally started to work. There are many examples of using LSTM and word embedding expressions in the world, but I decided to try it with a more minimized problem, and I am happy that I finally realized it.
However, it is a problem that there is still a part that depends on luck.
Recommended Posts