I tried to predict by letting RNN learn the sine wave

0. Roughly speaking

Implemented a simple RNN (Recurrent Neural Network) with TensorFlow.
Using RNN, we trained the sine wave to predict sin (t + 1) (next step) from sin (t).
By chaining the output results of RNN, we were able to realize the prediction of sin (t + n) (multiple steps).
LSTM (Long Short-Term Memory) was used for the RNN cell.

*** Added on May 27, 2016: ***

I wrote the sequel "I tried to predict by letting RNN learn the sine wave: Hyperparameter adjustment".

1. About TensorFlow, RNN, LSTM

I will omit it roughly. I think that the tutorial of TensorFlow and the articles referenced from it will be helpful.

Recurrent Neural Networks
Understanding LSTM Networks -- colah's blog
Overview of LSTM network --Qiita (Translation of the above article)

2. Preparation of training data

A sine wave with 50 steps per cycle was generated for 100 cycles, for a total of 5,000 steps, and used as training data. In addition, we have prepared two types of training data, noise-free and noise-free.

The training data consists of a pair of sin (t) (sin value at time t) and sin (t + 1) (sin value at time t + 1). For details on the generation of training data, please refer to the ʻipynb file (IPython Notebook). (As an aside, I was surprised to see the ʻipynb file previewed on GitHub)

2.1. No noise

train_data/normal.ipynb

2.2. There is noise

train_data/noised.ipynb

3. Learning / prediction

This time, we are learning and predicting with one code. The source code is shown in the appendix at the end of the sentence.

3.1. Process flow

The flow of learning and prediction is as follows.

Learning using training data
Predict sin (t + 1) using the initial data (the beginning of the training data)
Predict sin (t + 2) using the predicted sin (t + 1) 4.3 Repeat 3

3.2. Network configuration

I used a network called "input layer-hidden layer-RNN cell-output layer". We also used LSTMs for RNN cells.

3.3. Hyperparameters

The hyperparameters used for learning and prediction are as follows.

Variable name	meaning	value
num_of_input_nodes	Number of nodes in the input layer	1 node
num_of_hidden_nodes	Number of nodes in the hidden layer	2 nodes
num_of_output_nodes	Number of nodes in the output layer	1 node
length_of_sequences	RNN sequence length	50 steps
num_of_training_epochs	Number of learning repetitions	2,000 times
length_of_initial_sequences	Initial data sequence length	50 steps
num_of_prediction_epochs	Number of repetitions of prediction	100 times
size_of_mini_batch	Number of samples per mini-batch	100 samples
learning_rate	Learning rate	0.1
forget_bias	(I'm not sure)	1.0 (default value)

4. Prediction result

The figure below plots the prediction results. The legend is as follows.

Black dotted line: training data
Solid blue: Initial data
Solid green: Forecast data

4.1. No noise

A waveform like that is output. The overall amplitude is shallow, the vertices are distorted, and the frequency is a little lower. Please refer to basic / output.ipynb for specific values.

4.2. There is noise

The amplitude is even shallower and the frequency is slightly higher than without noise. Also, it seems that the noise component contained in the training data has been reduced. See noised / output.ipynb for specific values.

5. Future plans

I would like to try changing the network configuration and hyperparameters to see what kind of prediction results will be obtained.

*** Added on May 27, 2016: ***

I wrote the sequel "I made RNN learn sin waves and predicted: hyperparameter adjustment".

Appendix: Source code

The source code for the noise-free version is shown below. Please refer to GitHub for the source code of the noisy version. The noisy version and the noisy version differ only in the input file name.

Noise-free version: basic / rnn.py (code shown below)
Noisy version: noised / rnn.py

`rnn.py`


import tensorflow as tf
from tensorflow.models.rnn import rnn, rnn_cell
import numpy as np
import random

def make_mini_batch(train_data, size_of_mini_batch, length_of_sequences):
    inputs  = np.empty(0)
    outputs = np.empty(0)
    for _ in range(size_of_mini_batch):
        index   = random.randint(0, len(train_data) - length_of_sequences)
        part    = train_data[index:index + length_of_sequences]
        inputs  = np.append(inputs, part[:, 0])
        outputs = np.append(outputs, part[-1, 1])
    inputs  = inputs.reshape(-1, length_of_sequences, 1)
    outputs = outputs.reshape(-1, 1)
    return (inputs, outputs)

def make_prediction_initial(train_data, index, length_of_sequences):
    return train_data[index:index + length_of_sequences, 0]

train_data_path             = "../train_data/normal.npy"
num_of_input_nodes          = 1
num_of_hidden_nodes         = 2
num_of_output_nodes         = 1
length_of_sequences         = 50
num_of_training_epochs      = 2000
length_of_initial_sequences = 50
num_of_prediction_epochs    = 100
size_of_mini_batch          = 100
learning_rate               = 0.1
forget_bias                 = 1.0
print("train_data_path             = %s" % train_data_path)
print("num_of_input_nodes          = %d" % num_of_input_nodes)
print("num_of_hidden_nodes         = %d" % num_of_hidden_nodes)
print("num_of_output_nodes         = %d" % num_of_output_nodes)
print("length_of_sequences         = %d" % length_of_sequences)
print("num_of_training_epochs      = %d" % num_of_training_epochs)
print("length_of_initial_sequences = %d" % length_of_initial_sequences)
print("num_of_prediction_epochs    = %d" % num_of_prediction_epochs)
print("size_of_mini_batch          = %d" % size_of_mini_batch)
print("learning_rate               = %f" % learning_rate)
print("forget_bias                 = %f" % forget_bias)

train_data = np.load(train_data_path)
print("train_data:", train_data)

#Fix the random number seed.
random.seed(0)
np.random.seed(0)
tf.set_random_seed(0)

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)

with tf.Graph().as_default():
    input_ph      = tf.placeholder(tf.float32, [None, length_of_sequences, num_of_input_nodes], name="input")
    supervisor_ph = tf.placeholder(tf.float32, [None, num_of_output_nodes], name="supervisor")
    istate_ph     = tf.placeholder(tf.float32, [None, num_of_hidden_nodes * 2], name="istate") #Requires two values per cell.

    with tf.name_scope("inference") as scope:
        weight1_var = tf.Variable(tf.truncated_normal([num_of_input_nodes, num_of_hidden_nodes], stddev=0.1), name="weight1")
        weight2_var = tf.Variable(tf.truncated_normal([num_of_hidden_nodes, num_of_output_nodes], stddev=0.1), name="weight2")
        bias1_var   = tf.Variable(tf.truncated_normal([num_of_hidden_nodes], stddev=0.1), name="bias1")
        bias2_var   = tf.Variable(tf.truncated_normal([num_of_output_nodes], stddev=0.1), name="bias2")

        in1 = tf.transpose(input_ph, [1, 0, 2])         # (batch, sequence, data) -> (sequence, batch, data)
        in2 = tf.reshape(in1, [-1, num_of_input_nodes]) # (sequence, batch, data) -> (sequence * batch, data)
        in3 = tf.matmul(in2, weight1_var) + bias1_var
        in4 = tf.split(0, length_of_sequences, in3)     # sequence * (batch, data)

        cell = rnn_cell.BasicLSTMCell(num_of_hidden_nodes, forget_bias=forget_bias)
        rnn_output, states_op = rnn.rnn(cell, in4, initial_state=istate_ph)
        output_op = tf.matmul(rnn_output[-1], weight2_var) + bias2_var

    with tf.name_scope("loss") as scope:
        square_error = tf.reduce_mean(tf.square(output_op - supervisor_ph))
        loss_op      = square_error
        tf.scalar_summary("loss", loss_op)

    with tf.name_scope("training") as scope:
        training_op = optimizer.minimize(loss_op)

    summary_op = tf.merge_all_summaries()
    init = tf.initialize_all_variables()

    with tf.Session() as sess:
        saver = tf.train.Saver()
        summary_writer = tf.train.SummaryWriter("data", graph=sess.graph)
        sess.run(init)

        for epoch in range(num_of_training_epochs):
            inputs, supervisors = make_mini_batch(train_data, size_of_mini_batch, length_of_sequences)

            train_dict = {
                input_ph:      inputs,
                supervisor_ph: supervisors,
                istate_ph:     np.zeros((size_of_mini_batch, num_of_hidden_nodes * 2)),
            }
            sess.run(training_op, feed_dict=train_dict)

            if (epoch + 1) % 10 == 0:
                summary_str, train_loss = sess.run([summary_op, loss_op], feed_dict=train_dict)
                summary_writer.add_summary(summary_str, epoch)
                print("train#%d, train loss: %e" % (epoch + 1, train_loss))

        inputs  = make_prediction_initial(train_data, 0, length_of_initial_sequences)
        outputs = np.empty(0)
        states  = np.zeros((num_of_hidden_nodes * 2)),

        print("initial:", inputs)
        np.save("initial.npy", inputs)

        for epoch in range(num_of_prediction_epochs):
            pred_dict = {
                input_ph:  inputs.reshape((1, length_of_sequences, 1)),
                istate_ph: states,
            }
            output, states = sess.run([output_op, states_op], feed_dict=pred_dict)
            print("prediction#%d, output: %f" % (epoch + 1, output))

            inputs  = np.delete(inputs, 0)
            inputs  = np.append(inputs, output)
            outputs = np.append(outputs, output)

        print("outputs:", outputs)
        np.save("output.npy", outputs)

        saver.save(sess, "data/model")