Try TensorFlow RNN with a basic model

RNN (Reccurent Neural Network) is also implemented in TensorFlow, and although there is a tutorial, the example itself is a little complicated dealing with the language model, and I felt that it was difficult for beginners to understand.

This time, I will try the RNN implementation in TensorFlow, taking as an example the problem of dealing with a simpler model that is not a language model.

Caution

Since the version of TensorFlow has been upgraded and there is a part that is not working, here (TensorFlow RNN related import or Basic LSTMCell related error (v 0.11r ~) / items / dd24f176023b65e78f84)) Please try it.

environment

Simple RNN

The blog Peter's note is very helpful for the simple RNN model and how to implement it, so if you are new to it, please read that first. I recommend it. To quote the RNN diagram from the above site,

rnn

It looks like this.

In other words, after multiplying the data from the input layer unit x by the weight W_x, the output of the unit s that enters the hidden layer unit s is recursed, and the result of applying the weight W_rec is the unit s in the next step. enter. Considering the expanded state on the right of the above figure, the state of the initial value s_0 of the hidden layer unit changes the state while multiplying the weight W_rec as the step progresses. At that time, x is input in each step, and the state of s_n in the final step is output to the output layer unit y.

It is a flow like this.

Find the total value of the sequence

In the RNN model of Peter's note, it is a network model that outputs the total value when the numerical value of X_k = 0. or 1. is input. ..

For example

X = [0. 1. 1. 0. 1. 0. 0. 1. 0. 0.]

Is the total value of this list.

Y = 2.

Is a model that outputs correctly. It is a model that gives an immediate result if it is simply added, but this time it is calculated by training it with an RNN.

LSTM

However, TensorFlow's RNN Tutorial uses a method called LSTM (Long-Short Term Memory) instead of a simple RNN. As shown in the figure above, in a normal RNN, the size of the NN becomes larger and larger in proportion to the number of steps. Therefore, there is a problem that the amount of calculation and memory required to apply the error back propagation method increase, the propagated error explodes, and the calculation becomes unstable.

On the other hand, in LSTM, by using the LSTM unit instead of the simple hidden layer, how much the core value (memory cell value) of the unit is retained at the next time and how much it affects the next step can be determined. You can adjust it.

LSTM

One LSTM unit is as shown in the figure above.

--Memory cell (Cell): Indicates the past state and is represented by C_t. --Input gate: The output of the l-1st hidden layer at time th ^ {l−1} _t and the output of the lth hidden layer at time t−1 h_ {t−1 } ^ l. --Input modulation gate: Has the role of adjusting the value added to the memory cell. --Forget gate ・ ・ ・ It has a role to adjust how much the value of the memory cell is retained at the next time. --Output gate: It has the role of adjusting how much the value of the memory cell affects the next layer.

It is composed of the above elements.

For a detailed explanation of LSTM, see Christopher Olah's blog post (http://colah.github.io/posts/2015-08-Understanding-LSTMs/. 08-Understanding-LSTMs /)), which is a great learning experience.

Try to build with TensorFlow

Originally, in the above example, it is not necessary to bring out the LSTM because all the past inputs work equally on the current step. However, this time I will try the above total example using BasicLSTMCell which is implemented by default in TensoreFlow. I put the code in https://github.com/yukiB/rnntest.

First, data creation


def create_data(num_of_samples, sequence_len):
    X = np.zeros((num_of_samples, sequence_len))
    for row_idx in range(nb_of_samples):
        X[row_idx,:] = np.around(np.random.rand(sequence_len)).astype(int)
    # Create the targets for each sequence
    t = np.sum(X, axis=1)
    return X, t

It is done by.

The design of the LSTM layer is


def inference(input_ph, istate_ph):
     with tf.name_scope("inference") as scope:
        weight1_var = tf.Variable(tf.truncated_normal([num_of_input_nodes, num_of_hidden_nodes], stddev=0.1), name="weight1")
        weight2_var = tf.Variable(tf.truncated_normal([num_of_hidden_nodes, num_of_output_nodes], stddev=0.1), name="weight2")
        bias1_var   = tf.Variable(tf.truncated_normal([num_of_hidden_nodes], stddev=0.1), name="bias1")
        bias2_var   = tf.Variable(tf.truncated_normal([num_of_output_nodes], stddev=0.1), name="bias2")

        in1 = tf.transpose(input_ph, [1, 0, 2]) 
        in2 = tf.reshape(in1, [-1, num_of_input_nodes]) 
        in3 = tf.matmul(in2, weight1_var) + bias1_var
        in4 = tf.split(0, length_of_sequences, in3)   

        cell = rnn_cell.BasicLSTMCell(num_of_hidden_nodes, forget_bias=forget_bias)
        rnn_output, states_op = rnn.rnn(cell, in4, initial_state=istate_ph)
        output_op = tf.matmul(rnn_output[-1], weight2_var) + bias2_var
        return output_op

It will be done at.

in3 = tf.matmul(in2, weight1_var) + bias1_var

Gives the cell update formula in the LSTM unit. Also


output_op = tf.matmul(rnn_output[-1], weight2_var) + bias2_var

The output obtained from the last LSTM layer of all steps is weighted and biased to obtain the final output.

Regarding the cost calculation, since the output value is a continuous value this time, we used MSE (mean square error) and designed it to pass the unit data as it is without passing the Activation function.


def loss(output_op, supervisor_ph):
    with tf.name_scope("loss") as scope:
        square_error = tf.reduce_mean(tf.square(output_op - supervisor_ph))
        loss_op  =  square_error
        tf.scalar_summary("loss", loss_op)
        return loss_op

To evaluate the accuracy, we made a combination of 100 lists and correct answers, and calculated the ratio of those with a difference between the prediction result and the correct answer less than 0.05.


def calc_accuracy(output_op, prints=False):
        inputs, ts = make_prediction(num_of_prediction_epochs)
        pred_dict = {
                input_ph:  inputs,
                supervisor_ph: ts,
                istate_ph:    np.zeros((num_of_prediction_epochs, num_of_hidden_nodes * 2)),
        }
        output= sess.run([output_op], feed_dict=pred_dict)

        def print_result (p, q):
            print("output: %f, correct: %d" % (p , q)) 
        if prints:
            [print_result(p, q)  for p, q in zip(output[0], ts)]
        
        opt = abs(output - ts)[0]
        total = sum([1 if x[0] < 0.05 else 0 for x in opt])
        print("accuracy %f" % (total/float(len(ts))))
        return output

When you reach this point, specify the optimizer and start the calculation.


def training(loss_op):
    with tf.name_scope("training") as scope:
        training_op = optimizer.minimize(loss_op)
        return training_op

random.seed(0)
np.random.seed(0)
tf.set_random_seed(0)

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)

X, t = create_data(num_of_sample, length_of_sequences)

with tf.Graph().as_default():
    input_ph      = tf.placeholder(tf.float32, [None, length_of_sequences, num_of_input_nodes], name="input")
    supervisor_ph = tf.placeholder(tf.float32, [None, num_of_output_nodes], name="supervisor")
    istate_ph     = tf.placeholder(tf.float32, [None, num_of_hidden_nodes * 2], name="istate")

    output_op, states_op, datas_op = inference(input_ph, istate_ph)
    loss_op = loss(output_op, supervisor_ph)
    training_op = training(loss_op)

    summary_op = tf.merge_all_summaries()
    init = tf.initialize_all_variables()

    with tf.Session() as sess:
        saver = tf.train.Saver()
        summary_writer = tf.train.SummaryWriter("/tmp/tensorflow_log", graph=sess.graph)
        sess.run(init)

        for epoch in range(num_of_training_epochs):
            inputs, supervisors = get_batch(size_of_mini_batch, X, t)
            train_dict = {
                input_ph:      inputs,
                supervisor_ph: supervisors,
                istate_ph:     np.zeros((size_of_mini_batch, num_of_hidden_nodes * 2)),
            }
            sess.run(training_op, feed_dict=train_dict)

            if (epoch ) % 100 == 0:
                summary_str, train_loss = sess.run([summary_op, loss_op], feed_dict=train_dict)
                print("train#%d, train loss: %e" % (epoch, train_loss))
                summary_writer.add_summary(summary_str, epoch)
                if (epoch ) % 500 == 0:
                    calc_accuracy(output_op)

Execution result

The code in https://github.com/yukiB/rnntest outputs the final result as follows.

[0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0]
output: 6.010024, correct: 6
[1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0]
output: 5.986825, correct: 6
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
output: 0.223431, correct: 0
[0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0]
output: 3.002296, correct: 3
...
accuracy 0.980000

In the environment at hand, the accuracy was about 98% as a result of learning 5000 times in batches of 100 pieces. It seems that they are learning correctly.

Also, when I checked the convergence of the cost function with TensorBoard

image

It became as above.

in conclusion

This time, I tried to see the RNN implementation of TensorFLow through a simple summation model. If you change the optimizer or the number of hidden layers, the degree of convergence will change considerably, so it is interesting to try it.

Reference site

Recommended Posts

Try TensorFlow RNN with a basic model
Try TensorFlow MNIST with RNN
Try regression with TensorFlow
Try Tensorflow with a GPU instance on AWS
Try programming with a shell!
Try clustering with a mixed Gaussian model on a Jupyter Notebook
[Introduction to Tensorflow] Understand Tensorflow properly and try to make a model
[TensorFlow 2] Learn RNN with CTC Loss
Try deep learning with TensorFlow Part 2
Beginner RNN (LSTM) | Try with Keras
Make a model iterator with PySide
Try data parallelism with Distributed TensorFlow
Creating a Tensorflow Sequential model with original images added to MNIST
Build a Tensorflow environment with Raspberry Pi [2020]
Try HTML scraping with a Python library
Implement a model with state and behavior
SDN Basic Course for Programmers 3: Try Making a Switching Hub with Ryu
Try TensorFlow MNIST with RNN
[TensorFlow 2] Learn RNN with CTC Loss
numpy practice 1
Linux practice
Practice Pytorch
Tensorflow Glossary
tensorflow mnist_deep.py
TensorFlow tutorial tutorial
RNN AutoEncoder
Try TensorFlow RNN with a basic model
A model that identifies the guitar with fast.ai
Try a state-space model (Jupyter Notebook + IR kernel)
Try Distributed TensorFlow
Try to draw a life curve with python
Try sending a message with Twilio's SMS service
Predict hot summers with a linear regression model
Try to make a "cryptanalysis" cipher with Python
Try to make a dihedral group with Python
Load the TensorFlow model file .pb with readNetFromTensorflow ().
Implement a discrete-time logistic regression model with stan
Zundokokiyoshi with TensorFlow
Try drawing a map with Python's folium package
Try creating a FizzBuzz problem with a shell program
Breakout with Tensorflow
I tried to make something like a chatbot with the Seq2Seq model of TensorFlow
A story about an error when loading a TensorFlow model created with Google Colab locally
Simulate a good Christmas date with a Python optimized model
Try to make a command standby tool with python
Try to dynamically create a Checkbutton with Python's Tkinter
A sample to try Factorization Machines quickly with fastFM
Create a 3D model viewer with PyQt5 and PyQtGraph
Learn Wasserstein GAN with Keras model and TensorFlow optimization
Try scraping with Python.
[Day 9] Creating a model
A4 size with python-pptx
Reading data with TensorFlow
Kyotei forecast with TensorFlow
Model fitting with lmfit
Try SNN with BindsNET
Regression with linear model
Decorate with a decorator
I tried to implement a basic Recurrent Neural Network model
Try running python in a Django environment created with pipenv
Try to build a deep learning / neural network with scratch
[Django] Manage settings like writing in settings.py with a model
[Python] Try optimizing FX systole parameters with a genetic algorithm
I tried hosting a TensorFlow deep learning model using TensorFlow Serving
Try to bring up a subwindow with PyQt5 and Python
I made a VGG16 model using TensorFlow (on the way)
Real-time image recognition on mobile devices with TensorFlow learning model
getrpimodel: Recognize Raspberry Pi model (A, B, B +, B2, B3, etc) with python
Create a python machine learning model relearning mechanism with mlflow
Try to model a multimodal distribution using the EM algorithm