This time, about LSTM.
Prediction problems for regular phenomena such as time series may be a useful approach.
Since it is an output for my own understanding, I have traced most of the reference sites once.
LSTM(Long Short-Term Memory)
In a word, short-term memory is used for a long period of time to promote learning. There is a feature.
The diagram of the LSTM part is as follows.
In the calculation with ht, use the previous output results ht-1 and Xt. ht is the image used to find ht + 1.
・ OutputGate
・ Forget Gate
・ Input Gate
・ Activation function part
・ Memory Cell
Consists of
The red part of the arrow below.
Using linear transformation Wo for Xt, linear transformation Ro for ht, and bias Bo
Is calculated.
Calculation formula similar to neural network.
Forget Gate
Like OutputGate
There are Wf, Rf, Bf parameters
Is calculated.
Input Gate
Is calculated.
Calculation formula for the activation function part.
By ft = σ (WfXt + Rfht−1 + Bf) and output Ct-1 from the cell dotted line
The output is Ct−1 ⊗ ft.
⊗ is the product of each element
By it = σ (WiXt + Riht−1 + Bi) and Zt = tanh (WzXt + Rzht−1 + Bz)
An output called it ⊗ Zt.
By Ct−1 ⊗ ft in ① and it ⊗ Zt in ②
Ct = Ct−1 ⊗ ft + it ⊗ Zt
Is calculated.
Memory Cell part Ct = it ⊗ Zt + Ct−1 ⊗ ft
Output Gate part Ot = σ (WoXt + Roht−1 + Bo)
ht = Ot ⊗ tanh(Ct)
Is done.
Ct = Ct−1 ⊗ ft + it ⊗ Zt Ct-1 ⊗ ft In the Forget Gate part, Ct-1 adjusts how much it reflects the parameters of past information. It ⊗ Zt Input part adjusts how much the obtained input value it is reflected by the Zt activation function.
[Basics of LSTM that cannot be heard now] (
Recommended Posts