I made RNN learn a sine wave and made a prediction: Hyperparameter adjustment

0. Roughly speaking

Continuation of the previous article (I tried to make RNN learn sin wave and predict it).
We adjusted various hyperparameters and experimented with the results.
Hyperparameter adjustment is difficult!

1. Configuration

It basically uses the same network configuration as previous article. In addition, the hyperparameters are based on the values shown in "3.3. Hyperparameters".

Variable name	meaning	value
`num_of_input_nodes`	Number of nodes in the input layer	1 node
`num_of_hidden_nodes`	Number of nodes in the hidden layer	2 nodes
`num_of_output_nodes`	Number of nodes in the output layer	1 node
`length_of_sequences`	RNN sequence length	50 steps
`num_of_training_epochs`	Number of learning repetitions	2,000 times
`num_of_prediction_epochs`	Number of repetitions of prediction	100 times
`size_of_mini_batch`	Number of samples per mini-batch	100 samples
`learning_rate`	Learning rate	0.1
`forget_bias`	(I'm not sure)	1.0 (default value)

2. Source code, notebook

The source code used for learning / prediction, the notebook that generated the learning data, and the notebook used for charting the results are available on GitHub. Please refer to that for specific source code and values.

https://github.com/nayutaya/tensorflow-rnn-sin/tree/20160517/ex2

3. Hyperparameter adjustment

3.1. `num_of_hidden_nodes`: Number of hidden layer nodes

The chart of the loss function and the prediction result when the number of nodes in the hidden layer is changed from 1 to 4 is shown below. If the number of nodes in the hidden layer is 1, you can see that it is completely unpredictable. Also, if the number of nodes in the hidden layer is large, it seems that good results will not always be obtained. Looking at the loss function chart, the more nodes in the hidden layer, the less the final loss.

No	Number of nodes in the hidden layer	Learning / predicted time
1	`1`	3m53.845s
2	`2`	3m30.844s
3	`3`	4m36.324s
4	`4`	5m30.537s

3.2. `length_of_sequences`: RNN sequence length

The chart below shows the prediction results and loss function when the sequence length of the RNN is changed to 30, 40, 50, 60, and 70. The training data this time is a sine wave with 50 steps per cycle, but it can be seen that even if it is less than one cycle, it can be predicted sufficiently.

No	RNN sequence length	Learning / predicted time
1	`30`	2m29.589s
2	`40`	2m58.636s
3	`50`	3m30.844s
4	`60`	4m25.459s
5	`70`	5m38.550s

3.3. `num_of_training_epochs`: Number of training iterations

The chart of the prediction result and loss function when the number of learning repetitions is changed to 1,000, 2,000, and 3,000 is shown below. In the case of 3,000 times, the result of the loss function oscillates from around 1,600 times. The prediction results are also not good.

No	Number of learning repetitions	Learning / predicted time
1	1,000 times	2m10.783s
2	2,000 times	3m30.844s
3	3,000 times	6m17.675s

3.4. `size_of_mini_batch`: Number of samples per mini-batch

The chart below shows the prediction results and loss function when the number of samples per mini-batch is changed to 50, 100, and 200. There is no noticeable difference, but basically it seems that the larger the number of samples, the better the results.

No	Number of samples per mini-batch	Learning / predicted time
1	`50`	4m25.032s
2	`100`	3m30.844s
3	`200`	4m59.550s

3.5. `learning_rate`: Learning rate

The chart below shows the prediction results and loss function when the learning rate passed to the optimizer is changed to 0.02, 0.1, and 0.5. In the case of learning rates 0.02 and 0.5, it cannot be predicted properly. Also, in the case of the learning rate 0.5, the result of the loss function oscillates immediately after learning.

No	Learning rate	Learning / predicted time
1	`0.02`	3m46.852s
2	`0.1`	3m30.844s
3	`0.5`	4m39.136s

3.6. forget_bias

Actually, the chart of the loss function and the prediction result when changing the forget_bias parameter of BasicLSTMCell to 0.25, 0.5, 1.0 (default value), which is not well understood, is shown below. indicate. In the case of 0.25, it is not predictable.

No	forget_bias	Learning / predicted time
1	`0.25`	4m27.725s
2	`0.5`	4m27.089s
3	`1.0`	3m30.844s

3.7. Optimizer

The chart below shows the prediction results and loss function when the optimizer used for optimization is switched from Gradient Descent Optimizer to ʻAdam Optimizer. ʻAdam Optimizer has a faster loss reduction and lower final value, but it vibrates violently. It's difficult to stop learning.

No	optimizer	Learning / predicted time
1	`GradientDescentOptimizer`	3m30.844s
2	`AdamOptimizer`	4m46.116s

3.8. RNN cell

The chart below shows the prediction results and loss function when the RNN cell is switched from BasicLSTMCell to GRUCell. There was not much difference.

No	RNN cell	Learning / predicted time
1	`BasicLSTMCell`	3m30.844s
2	`GRUCell`	4m53.831s

4. Future plans

I would like to try what would happen if I learned and predicted more realistic data (stock prices, foreign exchange, etc.).