This time, about LSTM.
Prediction problems for regular phenomena such as time series may be a useful approach.
Since it is an output for my own understanding, I have traced most of the reference sites once.
LSTM(Long Short-Term Memory)
In a word, short-term memory is used for a long period of time to promote learning. There is a feature.
The diagram of the LSTM part is as follows.
In the calculation with ht, use the previous output results ht-1 and Xt. ht is the image used to find ht + 1.
Each,
・ OutputGate
・ Forget Gate
・ Input Gate
・ Activation function part
・ Memory Cell
Consists of
OutputGate
The red part of the arrow below.
Using linear transformation Wo for Xt, linear transformation Ro for ht, and bias Bo
Ot=σ(WoXt+Roht−1+Bo)
Is calculated.
Calculation formula similar to neural network.
Forget Gate
Like OutputGate
There are Wf, Rf, Bf parameters
ft=σ(WfXt+Rfht−1+Bf)
Is calculated.
Input Gate
Similarly
it=σ(WiXt+Riht−1+Bi)
Is calculated.
Zt=tanh(WzXt+Rzht−1+Bz)
Calculation formula for the activation function part.
By ft = σ (WfXt + Rfht−1 + Bf) and output Ct-1 from the cell dotted line
The output is Ct−1 ⊗ ft.
⊗ is the product of each element
By it = σ (WiXt + Riht−1 + Bi) and Zt = tanh (WzXt + Rzht−1 + Bz)
An output called it ⊗ Zt.
By Ct−1 ⊗ ft in ① and it ⊗ Zt in ②
Ct = Ct−1 ⊗ ft + it ⊗ Zt
Is calculated.
Memory Cell part Ct = it ⊗ Zt + Ct−1 ⊗ ft
Output Gate part Ot = σ (WoXt + Roht−1 + Bo)
Using
ht = Ot ⊗ tanh(Ct)
Is done.
Ct = Ct−1 ⊗ ft + it ⊗ Zt Ct-1 ⊗ ft In the Forget Gate part, Ct-1 adjusts how much it reflects the parameters of past information. It ⊗ Zt Input part adjusts how much the obtained input value it is reflected by the Zt activation function.
[Basics of LSTM that cannot be heard now] (https://www.hellocybernetics.tech/entry/2017/05/06/182757)
Recommended Posts