Try TensorFlow RNN with a basic model

RNN (Reccurent Neural Network) is also implemented in TensorFlow, and although there is a tutorial, the example itself is a little complicated dealing with the language model, and I felt that it was difficult for beginners to understand.

This time, I will try the RNN implementation in TensorFlow, taking as an example the problem of dealing with a simpler model that is not a language model.

Caution

Since the version of TensorFlow has been upgraded and there is a part that is not working, here (TensorFlow RNN related import or Basic LSTMCell related error (v 0.11r ~) / items / dd24f176023b65e78f84)) Please try it.

environment

OSX
python 2.7.11
TensorFlow r0.8

Simple RNN

The blog Peter's note is very helpful for the simple RNN model and how to implement it, so if you are new to it, please read that first. I recommend it. To quote the RNN diagram from the above site,

rnn

It looks like this.

In other words, after multiplying the data from the input layer unit x by the weight W_x, the output of the unit s that enters the hidden layer unit s is recursed, and the result of applying the weight W_rec is the unit s in the next step. enter. Considering the expanded state on the right of the above figure, the state of the initial value s_0 of the hidden layer unit changes the state while multiplying the weight W_rec as the step progresses. At that time, x is input in each step, and the state of s_n in the final step is output to the output layer unit y.

It is a flow like this.

Find the total value of the sequence

In the RNN model of Peter's note, it is a network model that outputs the total value when the numerical value of X_k = 0. or 1. is input. ..

For example

X = [0. 1. 1. 0. 1. 0. 0. 1. 0. 0.]

Is the total value of this list.

Y = 2.

Is a model that outputs correctly. It is a model that gives an immediate result if it is simply added, but this time it is calculated by training it with an RNN.

LSTM

However, TensorFlow's RNN Tutorial uses a method called LSTM (Long-Short Term Memory) instead of a simple RNN. As shown in the figure above, in a normal RNN, the size of the NN becomes larger and larger in proportion to the number of steps. Therefore, there is a problem that the amount of calculation and memory required to apply the error back propagation method increase, the propagated error explodes, and the calculation becomes unstable.

On the other hand, in LSTM, by using the LSTM unit instead of the simple hidden layer, how much the core value (memory cell value) of the unit is retained at the next time and how much it affects the next step can be determined. You can adjust it.

LSTM

One LSTM unit is as shown in the figure above.

--Memory cell (Cell): Indicates the past state and is represented by C_t. --Input gate: The output of the l-1st hidden layer at time th ^ {l−1} _t and the output of the lth hidden layer at time t−1 h_ {t−1 } ^ l. --Input modulation gate: Has the role of adjusting the value added to the memory cell. --Forget gate ・・・ It has a role to adjust how much the value of the memory cell is retained at the next time. --Output gate: It has the role of adjusting how much the value of the memory cell affects the next layer.

It is composed of the above elements.

For a detailed explanation of LSTM, see Christopher Olah's blog post (http://colah.github.io/posts/2015-08-Understanding-LSTMs/. 08-Understanding-LSTMs /)), which is a great learning experience.

Try to build with TensorFlow

Originally, in the above example, it is not necessary to bring out the LSTM because all the past inputs work equally on the current step. However, this time I will try the above total example using BasicLSTMCell which is implemented by default in TensoreFlow. I put the code in https://github.com/yukiB/rnntest.

First, data creation


def create_data(num_of_samples, sequence_len):
    X = np.zeros((num_of_samples, sequence_len))
    for row_idx in range(nb_of_samples):
        X[row_idx,:] = np.around(np.random.rand(sequence_len)).astype(int)
    # Create the targets for each sequence
    t = np.sum(X, axis=1)
    return X, t

It is done by.

The design of the LSTM layer is


def inference(input_ph, istate_ph):
     with tf.name_scope("inference") as scope:
        weight1_var = tf.Variable(tf.truncated_normal([num_of_input_nodes, num_of_hidden_nodes], stddev=0.1), name="weight1")
        weight2_var = tf.Variable(tf.truncated_normal([num_of_hidden_nodes, num_of_output_nodes], stddev=0.1), name="weight2")
        bias1_var   = tf.Variable(tf.truncated_normal([num_of_hidden_nodes], stddev=0.1), name="bias1")
        bias2_var   = tf.Variable(tf.truncated_normal([num_of_output_nodes], stddev=0.1), name="bias2")

        in1 = tf.transpose(input_ph, [1, 0, 2]) 
        in2 = tf.reshape(in1, [-1, num_of_input_nodes]) 
        in3 = tf.matmul(in2, weight1_var) + bias1_var
        in4 = tf.split(0, length_of_sequences, in3)   

        cell = rnn_cell.BasicLSTMCell(num_of_hidden_nodes, forget_bias=forget_bias)
        rnn_output, states_op = rnn.rnn(cell, in4, initial_state=istate_ph)
        output_op = tf.matmul(rnn_output[-1], weight2_var) + bias2_var
        return output_op

It will be done at.

in3 = tf.matmul(in2, weight1_var) + bias1_var

Gives the cell update formula in the LSTM unit. Also


output_op = tf.matmul(rnn_output[-1], weight2_var) + bias2_var

The output obtained from the last LSTM layer of all steps is weighted and biased to obtain the final output.

Regarding the cost calculation, since the output value is a continuous value this time, we used MSE (mean square error) and designed it to pass the unit data as it is without passing the Activation function.


def loss(output_op, supervisor_ph):
    with tf.name_scope("loss") as scope:
        square_error = tf.reduce_mean(tf.square(output_op - supervisor_ph))
        loss_op  =  square_error
        tf.scalar_summary("loss", loss_op)
        return loss_op

To evaluate the accuracy, we made a combination of 100 lists and correct answers, and calculated the ratio of those with a difference between the prediction result and the correct answer less than 0.05.


def calc_accuracy(output_op, prints=False):
        inputs, ts = make_prediction(num_of_prediction_epochs)
        pred_dict = {
                input_ph:  inputs,
                supervisor_ph: ts,
                istate_ph:    np.zeros((num_of_prediction_epochs, num_of_hidden_nodes * 2)),
        }
        output= sess.run([output_op], feed_dict=pred_dict)

        def print_result (p, q):
            print("output: %f, correct: %d" % (p , q)) 
        if prints:
            [print_result(p, q)  for p, q in zip(output[0], ts)]
        
        opt = abs(output - ts)[0]
        total = sum([1 if x[0] < 0.05 else 0 for x in opt])
        print("accuracy %f" % (total/float(len(ts))))
        return output

When you reach this point, specify the optimizer and start the calculation.


def training(loss_op):
    with tf.name_scope("training") as scope:
        training_op = optimizer.minimize(loss_op)
        return training_op

random.seed(0)
np.random.seed(0)
tf.set_random_seed(0)

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)

X, t = create_data(num_of_sample, length_of_sequences)

with tf.Graph().as_default():
    input_ph      = tf.placeholder(tf.float32, [None, length_of_sequences, num_of_input_nodes], name="input")
    supervisor_ph = tf.placeholder(tf.float32, [None, num_of_output_nodes], name="supervisor")
    istate_ph     = tf.placeholder(tf.float32, [None, num_of_hidden_nodes * 2], name="istate")

    output_op, states_op, datas_op = inference(input_ph, istate_ph)
    loss_op = loss(output_op, supervisor_ph)
    training_op = training(loss_op)

    summary_op = tf.merge_all_summaries()
    init = tf.initialize_all_variables()

    with tf.Session() as sess:
        saver = tf.train.Saver()
        summary_writer = tf.train.SummaryWriter("/tmp/tensorflow_log", graph=sess.graph)
        sess.run(init)

        for epoch in range(num_of_training_epochs):
            inputs, supervisors = get_batch(size_of_mini_batch, X, t)
            train_dict = {
                input_ph:      inputs,
                supervisor_ph: supervisors,
                istate_ph:     np.zeros((size_of_mini_batch, num_of_hidden_nodes * 2)),
            }
            sess.run(training_op, feed_dict=train_dict)

            if (epoch ) % 100 == 0:
                summary_str, train_loss = sess.run([summary_op, loss_op], feed_dict=train_dict)
                print("train#%d, train loss: %e" % (epoch, train_loss))
                summary_writer.add_summary(summary_str, epoch)
                if (epoch ) % 500 == 0:
                    calc_accuracy(output_op)

Execution result

The code in https://github.com/yukiB/rnntest outputs the final result as follows.

[0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0]
output: 6.010024, correct: 6
[1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0]
output: 5.986825, correct: 6
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
output: 0.223431, correct: 0
[0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0]
output: 3.002296, correct: 3
...
accuracy 0.980000

In the environment at hand, the accuracy was about 98% as a result of learning 5000 times in batches of 100 pieces. It seems that they are learning correctly.

Also, when I checked the convergence of the cost function with TensorBoard

It became as above.

in conclusion

This time, I tried to see the RNN implementation of TensorFLow through a simple summation model. If you change the optimizer or the number of hidden layers, the degree of convergence will change considerably, so it is interesting to try it.

Reference site

Peter's note - How to implement a recurrent neural network http://peterroelants.github.io/ --Overview of LSTM network http://qiita.com/KojiOhki/items/89cd7b69a8a6239d67ca --I implemented a basic Recurrent Neural Network model http://qiita.com/TomokIshii/items/01c2171f4def1a128fd3 --Predicting time series data with a neural network http://qiita.com/icoxfog417/items/2791ee878deee0d0fd9c