It basically uses the same network configuration as previous article. In addition, the hyperparameters are based on the values shown in "3.3. Hyperparameters".
Variable name | meaning | value |
---|---|---|
num_of_input_nodes |
Number of nodes in the input layer | 1 node |
num_of_hidden_nodes |
Number of nodes in the hidden layer | 2 nodes |
num_of_output_nodes |
Number of nodes in the output layer | 1 node |
length_of_sequences |
RNN sequence length | 50 steps |
num_of_training_epochs |
Number of learning repetitions | 2,000 times |
num_of_prediction_epochs |
Number of repetitions of prediction | 100 times |
size_of_mini_batch |
Number of samples per mini-batch | 100 samples |
learning_rate |
Learning rate | 0.1 |
forget_bias |
(I'm not sure) | 1.0 (default value) |
The source code used for learning / prediction, the notebook that generated the learning data, and the notebook used for charting the results are available on GitHub. Please refer to that for specific source code and values.
https://github.com/nayutaya/tensorflow-rnn-sin/tree/20160517/ex2
num_of_hidden_nodes
: Number of hidden layer nodesThe chart of the loss function and the prediction result when the number of nodes in the hidden layer is changed from 1 to 4 is shown below.
If the number of nodes in the hidden layer is 1
, you can see that it is completely unpredictable. Also, if the number of nodes in the hidden layer is large, it seems that good results will not always be obtained.
Looking at the loss function chart, the more nodes in the hidden layer, the less the final loss.
No | Number of nodes in the hidden layer | Learning / predicted time |
---|---|---|
1 | 1 |
3m53.845s |
2 | 2 |
3m30.844s |
3 | 3 |
4m36.324s |
4 | 4 |
5m30.537s |
length_of_sequences
: RNN sequence lengthThe chart below shows the prediction results and loss function when the sequence length of the RNN is changed to 30
, 40
, 50
, 60
, and 70
.
The training data this time is a sine wave with 50 steps per cycle, but it can be seen that even if it is less than one cycle, it can be predicted sufficiently.
No | RNN sequence length | Learning / predicted time |
---|---|---|
1 | 30 |
2m29.589s |
2 | 40 |
2m58.636s |
3 | 50 |
3m30.844s |
4 | 60 |
4m25.459s |
5 | 70 |
5m38.550s |
num_of_training_epochs
: Number of training iterationsThe chart of the prediction result and loss function when the number of learning repetitions is changed to 1,000, 2,000, and 3,000 is shown below. In the case of 3,000 times, the result of the loss function oscillates from around 1,600 times. The prediction results are also not good.
No | Number of learning repetitions | Learning / predicted time |
---|---|---|
1 | 1,000 times | 2m10.783s |
2 | 2,000 times | 3m30.844s |
3 | 3,000 times | 6m17.675s |
size_of_mini_batch
: Number of samples per mini-batchThe chart below shows the prediction results and loss function when the number of samples per mini-batch is changed to 50
, 100
, and 200
.
There is no noticeable difference, but basically it seems that the larger the number of samples, the better the results.
No | Number of samples per mini-batch | Learning / predicted time |
---|---|---|
1 | 50 |
4m25.032s |
2 | 100 |
3m30.844s |
3 | 200 |
4m59.550s |
learning_rate
: Learning rateThe chart below shows the prediction results and loss function when the learning rate passed to the optimizer is changed to 0.02
, 0.1
, and 0.5
.
In the case of learning rates 0.02
and 0.5
, it cannot be predicted properly. Also, in the case of the learning rate 0.5
, the result of the loss function oscillates immediately after learning.
No | Learning rate | Learning / predicted time |
---|---|---|
1 | 0.02 |
3m46.852s |
2 | 0.1 |
3m30.844s |
3 | 0.5 |
4m39.136s |
3.6. forget_bias
Actually, the chart of the loss function and the prediction result when changing the forget_bias
parameter of BasicLSTMCell
to 0.25
, 0.5
, 1.0
(default value), which is not well understood, is shown below. indicate.
In the case of 0.25
, it is not predictable.
No | forget_bias | Learning / predicted time |
---|---|---|
1 | 0.25 |
4m27.725s |
2 | 0.5 |
4m27.089s |
3 | 1.0 |
3m30.844s |
The chart below shows the prediction results and loss function when the optimizer used for optimization is switched from Gradient Descent Optimizer
to ʻAdam Optimizer. ʻAdam Optimizer
has a faster loss reduction and lower final value, but it vibrates violently. It's difficult to stop learning.
No | optimizer | Learning / predicted time |
---|---|---|
1 | GradientDescentOptimizer |
3m30.844s |
2 | AdamOptimizer |
4m46.116s |
The chart below shows the prediction results and loss function when the RNN cell is switched from BasicLSTMCell
to GRUCell
.
There was not much difference.
No | RNN cell | Learning / predicted time |
---|---|---|
1 | BasicLSTMCell |
3m30.844s |
2 | GRUCell |
4m53.831s |
I would like to try what would happen if I learned and predicted more realistic data (stock prices, foreign exchange, etc.).
Recommended Posts