This time, about LSTM. Prediction problems for regular phenomena such as time series may be a useful approach. Since it is an output for my own understanding, I have traced most of the reference sites once. LSTM(Long Short-Term Memory) In a word, short-term memory is used for a long period of time to promote learning. There is a feature. The diagram of the LSTM part is as follows. In the calculation with ht, use the previous output results ht-1 and Xt. ht is the image used to find ht + 1. Each, ・ OutputGate ・ Forget Gate ・ Input Gate ・ Activation function part ・ Memory Cell Consists of OutputGate The red part of the arrow below. Using linear transformation Wo for Xt, linear transformation Ro for ht, and bias Bo Ot=σ(WoXt+Roht−1+Bo) Is calculated. Calculation formula similar to neural network. Forget Gate Like OutputGate There are Wf, Rf, Bf parameters ft=σ(WfXt+Rfht−1+Bf) Is calculated. Input Gate Similarly it=σ(WiXt+Riht−1+Bi) Is calculated.
Zt=tanh(WzXt+Rzht−1+Bz) Calculation formula for the activation function part.
By ft = σ (WfXt + Rfht−1 + Bf) and output Ct-1 from the cell dotted line The output is Ct−1 ⊗ ft. ⊗ is the product of each element
By it = σ (WiXt + Riht−1 + Bi) and Zt = tanh (WzXt + Rzht−1 + Bz) An output called it ⊗ Zt.
By Ct−1 ⊗ ft in ① and it ⊗ Zt in ② Ct = Ct−1 ⊗ ft + it ⊗ Zt Is calculated.
Memory Cell part Ct = it ⊗ Zt + Ct−1 ⊗ ft Output Gate part Ot = σ (WoXt + Roht−1 + Bo) Using ht = Ot ⊗ tanh(Ct) Is done.
Ct = Ct−1 ⊗ ft + it ⊗ Zt Ct-1 ⊗ ft In the Forget Gate part, Ct-1 adjusts how much it reflects the parameters of past information. It ⊗ Zt Input part adjusts how much the obtained input value it is reflected by the Zt activation function.
[Basics of LSTM that cannot be heard now] (https://www.hellocybernetics.tech/entry/2017/05/06/182757)
Recommended Posts