Predict time series data with neural network

When dealing with time series data in a neural network, use a recurrent neural network. This time, I will explain about the recurrent neural network.

(Because it is long, the neural network is abbreviated as NN, and the recurrent neural network is abbreviated as RNN)

Overview of RNN

In some data, the previous data has a correlation with the next data, such as when an "x" appears, there is a high possibility that an "y" will come. Specifically, it is words and music (such as "ha" or "ga" often comes after "I"). For such time-series correlated data, it is naturally tempting to consider previously generated data. Is it possible to input previously generated data to NN? The answer is RNN.

Specifically, it is as shown in the figure below.

rnn2.PNG

The contents of the hidden layer at time $ t $ are treated as input at the next time $ t + 1 $. The hidden layer of $ t + 1 $ continues with $ t + 2 $, but the point is that the previous hidden layer is also used for learning the next hidden layer.

RNN type

name Combined target Feature
Fully recurrent network All nodes(1:N) Combine completely bidirectionally including itself
Hopfield network All nodes(1:N-1) Bidirectional join, does not include itself in the join target
Elman network 1:1 (Hidden layer->Hidden layer) Input layer / context(Hidden layer)・ Three-layer structure of output layer
Jordan network 1:1 (Output layer->Hidden layer) Input layer / context(Hidden layer)・Output layerの3層構造
Echo state network (ESN) 1->1? The target of joining is a set of nodes(reservoir)Randomly determined from
Long short term memory network (LSTM) - Instead of an RNN node, a Block that can hold input values is adopted. High precision
Bi-directional RNN (BRNN) - Bidirectional(past->future/future->past)A combination of RNNs

Hopfield network has applicable to optimization problems in addition to general classification. /~kanakubo/research/neuro/hopfieldnetwork.html) This is a model.

Elman / Jordan is the simplest form as it is called Simple recurrent networks. If you want to use RNN, you should try either one first, and if there is an accuracy problem, try switching to another method. The difference between Elman / Jordan is as above (whether the previous data is reflected from the hidden layer or the output layer), but here It is also written in detail in. There is no exact superiority or inferiority, but I think Elman is more flexible because the amount of propagation next can be changed depending on the number of hidden layers.

Echo state network is a model with a different coat color, and it is stored in a pool called Reservoir (meaning a reservoir etc.) without connecting nodes in advance. The style is to randomly / dynamically join after the input is given. The point is that there is no predetermined connection in the human brain, so it was created with the concept of imitating it and connecting fluidly. It seems that this is also called Liquid State Machines (literally, liquid mechanism).

Long short term memory network (LSTM) and Bi-directional RNN (BRNN) have no particular restrictions on how to join. LSTMs use LSTM blocks to remember weights instead of simple nodes. This is to solve the learning challenges in RNNs and will be explained later.

Bi-directional RNNs can improve accuracy by learning not only one-way learning from the past to the future, but also time series in a certain negative direction from the future to the past.

Learning RNN

The following documents are very carefully written about learning RNNs. Although it is in English, there is almost no Japanese literature on RNN at this stage (2015/1), so there is no choice but to give up and read it.

A tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the "echo state network" approach

Learning RNNs is generally very slow to converge. You need to lower the learning rate for accuracy, but lowering it will slow down the already slow convergence. This is a trade-off, but there seems to be a way to solve the gradient instability in the optimization process (see [EFFICIENT SECOND-ORDER LEARNING ALGORITHMS FOR DISCRETE-TIME RECURRENT NEURAL NETWORKS] for details]. (See http://ir.nmu.org.ua/bitstream/handle/123456789/120274/866d31771b48ba40c56fcc039f091b9b.pdf?sequence=1&isAllowed=y#page=58).

One thing I can say is that, at the moment (2015/1), there is no established method that has no problem in accuracy and speed in RNN learning, so naturally there is no library that implements it. .. It is necessary to practice here steadily.

BPTT (BackPropagation Through Time) The basic idea is that backpropagation should be applicable as usual, as RNNs can be considered long NNs when expanded. The image is as follows.

bptt1.PNG

The error propagates from the last time T to the first 0. Therefore, the error of the output layer at a certain time t is the sum of "the difference between the teacher (teacher data) and output (output) at time t" and "the error propagated from t + 1".

As is clear from the figure, BPTT cannot train without the data up to the last T, that is, all the time series data. Therefore, it is necessary to take measures such as cutting out only the latest data for long data.

This BPTT has various problems, and various learning methods have been devised to deal with them.

LSTM(Long short term memory) If T is too large, that is, for long time series data, the error from the upper layer may be diminished or conversely very large due to a calculation problem (this is detailed here (p8 ~)]( http://www.slideshare.net/beam2d/pfi-seminar-20141030rnn)). The larger the value, the greater the limit of the maximum value, but since it can't be helped to disappear, the idea of LSTM is to propagate the error so that it is not attenuated.

teacher forcing In RNN, the output of t becomes the input of t + 1, and so on, but at the time of learning, the correct answer of the input to t + 1 is clear from the teacher, so the method of using it as it is is. This allows each layer to learn by ignoring the influence from the lower layer and increase the convergence speed, but it seems that the output is not stable when actually executed (after learning).

RPROP(Resilient backpropagation) This is the method also used for regular NNs. When training NN, the gradient is calculated, and the weight ($ \ eta $) is multiplied according to how the direction (sign) of the gradient changed between the previous time and now (see [here](http for details). You can read more about: //paginas.fe.up.pt/~ee02162/dissertacao/RPROP%20paper.pdf).

I think it's named Resilient because this behavior makes it feel like you're rolling the ball (accelerating on a gradient, slowing down when the gradient changes direction and exerting a force in the opposite direction). ..

With a function like the Sigmoid function, learning will be difficult because it will be flat (gradient is almost 0) where the value exceeds a certain range (Flat Spot Problem. wiki / Flat_Spot_Problem))), applying this method also has the effect of preventing learning stagnation due to the weighting.

There are many variations of this technique itself. For details, please refer to here.

In addition to the various methods described above, it is also important to tune parameters such as the learning rate that adjusts the degree of error propagation and the momentum that adjusts the degree of influence of the previous layer, as with normal NN.

BPTT is generally slow to converge and takes a long time to learn. Therefore, hidden layer nodes are often used in small networks of about 3 to 20, and if it exceeds this, it may take several hours or even longer to learn.

RTRL (Real Time Recurrent Learning) Unlike BPTT, RTRL is a method of propagating errors in the future, which makes it suitable for online learning.

rtrl.PNG

The error that occurred at time t updates the weight at the next time t + 1. In the figure above, the error is calculated and propagated at each time, but there is also a method of updating after a certain period of time (epoch). However, since the weight that must be updated at one time is larger than that of BPTT, the calculation load is high.

EKF (Extended Kalman Filter) It is EKF that applies the extended Kalman filter to the RNN and updates the weights. The extended Kalman filter is a non-linear extension of the Kalman filter that handles linear systems, and estimates the state of the system as follows.

$ x(n+1) = f(x(n)) + q(n) $ $ d(n) = h_n(x(n)) $

The above formula expresses the following.

The image is as shown in the figure below.

ekf.PNG

And RNN can be regarded as this extended Kalman filter. The figure below shows this.

rnn_ekf1.PNG

I think it's okay to have the weight $ w $ as the state and the output as $ d $. The problem is the input, but by considering it as part of the function $ h $ for calculating the output $ d $, we're saying it's an extended Kalman filter (actually the input and weights of the input). It's calculated in $ w $, so I don't think it's too difficult).

Then, the method of updating the state of the extended Kalman filter can be applied as it is to updating the state of the RNN, that is, the weight. The calculation formula for state update is quite complicated, so I will omit the details, but the method called EKF brings the extended Kalman filter method to RNN in this way. There is also a method for simplifying the calculation, which is a promising method, but like BPTT and RTRL, empirical tuning (learning rate, network configuration, etc.) is required to achieve accuracy.

RNN library

Pybrain is clearly supported by major libraries. A Recurrent Network Tutorial (http://pybrain.org/docs/tutorial/netmodcon.html#using-recurrent-networks) is also available.

It seems that it is possible with pylearn2, which is famous for deep learning, but as you can see from the path, it is still in the sandbox at the moment (2015/1), and it is in an uneasy state to actually use it.

lisa-lab/pylearn2 pylearn2/pylearn2/sandbox/rnn/models/tests/test_rnn.py

If you want to implement it yourself, the method using Theano is introduced.

Implementing a recurrent neural network in python gwtaylor/theano-rnn

This is a combination of RNN and RBM, but the implementation including the code is introduced.

Modeling and generating sequences of polyphonic music with the RNN-RBM Introduction to melody prediction and generation by RNN-RBM and music information processing

In addition, neuraltalk seems to be a model that learns an image and its explanation, and outputs an explanation when an image is given. It's more of a ready-made library than a library for building, but I think it's good to use it for this purpose.

RNN implementation

This time, I will implement RNN using pybrain, which has an implementation example as described above.

The latest version of PyBrain is 0.3.3 (as of January 2015). It seems that it has been [uploaded] on the PYPI site (https://pypi.python.org/pypi/PyBrain/0.3.3) ... but since it is 0.3.2 to enter from pip, git clone Drop the repository with and install it. Please refer to here for the procedure and dependent libraries.

Installation

The main dependency is Scipy. Python is written as 2.5, but I have confirmed that the test (python runtests.py) can be passed with Python 3.4.2 in my environment. Looking at issues etc., it seems that Python 3 has some unsupported parts, but there was no problem while using it (unless there is an error without knowing it ...).

For the predicted time series data, we generated and used the ball trajectory data. The ball bound data (www.cs.utoronto.ca/~ilya/code) initially used in this paper /2008/RTRBM.tar), but the operating environment is old as Python2, and if you believe the description in the README, it takes a week to learn (quote: the bouncing balls problems trains for a considerably longer amount of Since it was time (about a week on a fast computer ...)), I decided to generate and use a simple trajectory.

The building of the model is carefully described in the PyBrain tutorial, but the main description methods are summarized below.

Welcome to PyBrain’s documentation!

Network construction

Assemble using pybrain.structure. In the following, we are building a normal network with a bias term in 2-3-1.

Building Networks with Modules and Connections

from pybrain.structure import FeedForwardNetwork, LinearLayer, SigmoidLayer, BiasUnit, FullConnection

net = FeedForwardNetwork()
net.addInputModule(LinearLayer(2, name='i'))
net.addModule(BiasUnit('bias'))
net.addModule(SigmoidLayer(3, name='h'))
net.addOutputModule(LinearLayer(1, name='o'))

# connect nodes
net.addConnection(FullConnection(net['i'], net['h']))
net.addConnection(FullConnection(net['bias'], net['h']))
net.addConnection(FullConnection(net['bias'], net['o']))
net.addConnection(FullConnection(net['h'], net['o']))

It's easier to build with buildNetwork. The following is the same as the above process.

net = buildNetwork(2, 3, 1, bias=True, hiddenclass=SigmoidLayer)

Network learning

To do the training, first prepare a dataset. In the following, the data of output 1 is passed as ʻaddSample for input 2 according to the network constructed above (Note that ʻappendLinked and ʻaddSample` appearing in the document are equivalent. //github.com/pybrain/pybrain/blob/1dd5086a51c3c98497ef85b31178588a89d8951e/pybrain/datasets/unsupervised.py#L31)).

Building a DataSet

from pybrain.datasets import SupervisedDataSet
ds = SupervisedDataSet(2, 1)
ds.addSample((0, 0), (0,))
...

We will train with the prepared data set. trainer.train returns a double proportional to the error, which allows you to evaluate the fit to your training data.

Training your Network on your Dataset

from pybrain.supervised.trainers import BackpropTrainer
net = buildNetwork(2, 3, 1, bias=True, hiddenclass=TanhLayer)
trainer = BackpropTrainer(net, ds)
err = trainer.train()

Network prediction

Prediction is done with the ʻactivate` function.

net.activate([1, 2])

Building an RNN

In the case of RNN, it is almost the same as a normal network construction. RNN uses RecurrentNetwork, and when making a recursive connection, connect with ʻaddRecurrentConnection`.

Using Recurrent Networks

Then, the prediction is executed by ʻactivate after resetting once with net.reset () . For ʻactivate, in the above example, I predicted by entering the same value all the time, but in reality it was not true unless I entered the predicted value again and did it (in theory, even the first input). If there is, it can be predicted more and more after that, so I feel that there is no problem even if the initial value is left as it is ...).

This time I tried it with Elman and Jordan. Below is an image in Jordan. The coordinates of x and y and their respective accelerations are passed as inputs.

rnn_demo.PNG

Acceleration is determined by the position of time t and the position of time t + 1, so I thought that it would learn well in the hidden layer ... but I added it as an input parameter because the accuracy was not achieved.

For the training data, we prepared several batches with different initial positions under the same initial acceleration, and trained / tested with them. Since the initial acceleration is the same for the training / test data, this model is a model that estimates what kind of trajectory will be drawn when the ball is placed at a certain point under that acceleration.

The accuracy is anxious, but the error with the test data was about 5.7 on average, which was not very good. Since this data predicts the trajectory of a ball bouncing in a 10x10 square, an error of 5.7 is a level that can be said to be almost completely wrong.

The animation shows that the feelings are moving to the extent that you can understand them, but there are a lot of things that can be completely reproduced. I also tried increasing or decreasing the hidden layers and nodes, but it didn't change.

Actual orbit real1.gif

Predicted trajectory (quite close to the best) predict1.gif

The code used for verification is here. If you are the one who says I am, I am waiting for a pull request.

rnn_demo

reference

Recommended Posts

Predict time series data with neural network
Forecasting time series data with Simplex Projection
View details of time series data with Remotte
Train MNIST data with a neural network in PyTorch
Neural network with Python (scikit-learn)
3. Normal distribution with neural network!
Neural network starting with Chainer
[Python] Plot time series data
4. Circle parameters with neural network!
Easy time series prediction with Prophet
Neural network with OpenCV 3 and Python 3
Python: Time Series Analysis: Preprocessing Time Series Data
Simple classification model with neural network
[TensorFlow] [Keras] Neural network construction with Keras
About time series data and overfitting
Differentiation of time series data (discrete)
Time series analysis 3 Preprocessing of time series data
How to extract features of time series data with PySpark Basics
Persist the neural network built with PyBrain
Time series data anomaly detection for beginners
How to handle time series data (implementation)
Reading OpenFOAM time series data and sets data
2. Mean and standard deviation with neural network!
Plot CSV of time series data with unixtime value in Python (matplotlib)
Convenient time series aggregation with TimeGrouper in pandas
Format and display time series data with different scales and units with Python or Matplotlib
Get time series data from k-db.com in Python
Experiment with various optimization algorithms with a neural network
Kaggle Kernel Method Summary [Table Time Series Data]
Verification of Batch Normalization with multi-layer neural network
Time Series Decomposition
Acquisition of time series data (daily) of stock prices
Smoothing of time series and waveform data 3 methods (smoothing)
How to read time series data in PyTorch
Parametric Neural Network
Reading, summarizing, visualizing, and exporting time series data to an Excel file with Python
Implementation of clustering k-shape method for time series data [Unsupervised learning with python Chapter 13]
"Getting stock price time series data from k-db.com with Python" Program environment creation memo
Predict from various data in Python using Facebook Prophet, a time series prediction tool
Visualize Prophet's time series forecasts more clearly with Plotly
Features that can be extracted from time series data
Anomaly detection of time series data by LSTM (Keras)
I tried to implement time series prediction with GBDT
Time series data prediction by AutoML (automatic machine learning)
[Time series with plotly] Dynamic visualization with plotly [python, stock price]
Data analysis with python 2
Python: Time Series Analysis
[Introduction to SIR model] Predict the end time of each country with COVID-19 data fitting ♬
Visualize data with Streamlit
Reading data with TensorFlow
"Measurement Time Series Analysis of Economic and Finance Data" Solving Chapter End Problems with Python
Implement Convolutional Neural Network
Python time series question
RNN_LSTM1 Time series analysis
Time series analysis 1 Basics
Implement Neural Network from 1
Data visualization with pandas
Data manipulation with Pandas!
Convolutional neural network experience
Shuffle data with pandas
Data Augmentation with openCV