This time, about LSTM. Prediction problems for regular phenomena such as time series may be a useful approach. Since it is an output for my own understanding, I have traced most of the reference sites once. LSTM(Long Short-Term Memory) In a word, short-term memory is used for a long period of time to promote learning. There is a feature. The diagram of the LSTM part is as follows. スクリーンショット 2020-03-01 19.57.58.png In the calculation with ht, use the previous output results ht-1 and Xt. ht is the image used to find ht + 1. Each, ・ OutputGate ・ Forget Gate ・ Input Gate ・ Activation function part ・ Memory Cell Consists of OutputGate The red part of the arrow below. Using linear transformation Wo for Xt, linear transformation Ro for ht, and bias Bo Ot=σ(WoXt+Roht−1+Bo) Is calculated. Calculation formula similar to neural network. スクリーンショット 2020-03-01 20.22.56.png Forget Gate Like OutputGate There are Wf, Rf, Bf parameters ft=σ(WfXt+Rfht−1+Bf) Is calculated. スクリーンショット 2020-03-01 20.30.57.png Input Gate Similarly it=σ(WiXt+Riht−1+Bi) Is calculated. スクリーンショット 2020-03-01 20.33.26.png

Activation function part

Zt=tanh(WzXt+Rzht−1+Bz) Calculation formula for the activation function part. スクリーンショット 2020-03-01 20.39.47.png

Calculation near Memory Cell

① Forget Gate side

By ft = σ (WfXt + Rfht−1 + Bf) and output Ct-1 from the cell dotted line The output is Ct−1 ⊗ ft. ⊗ is the product of each element スクリーンショット 2020-03-01 20.43.05.png

② Input Gate side

By it = σ (WiXt + Riht−1 + Bi) and Zt = tanh (WzXt + Rzht−1 + Bz) An output called it ⊗ Zt. スクリーンショット 2020-03-01 20.44.28.png

③ Before Cell

By Ct−1 ⊗ ft in ① and it ⊗ Zt in ② Ct = Ct−1 ⊗ ft　+ it ⊗ Zt Is calculated. スクリーンショット 2020-03-01 20.45.04.png

④ Near the output

Memory Cell part Ct = it ⊗ Zt + Ct−1 ⊗ ft Output Gate part Ot = σ (WoXt + Roht−1 + Bo) Using ht = Ot ⊗ tanh(Ct) Is done. スクリーンショット 2020-03-01 20.45.28.png

LSTM points

Ct = Ct−1 ⊗ ft　+ it ⊗ Zt Ct-1 ⊗ ft In the Forget Gate part, Ct-1 adjusts how much it reflects the parameters of past information. It ⊗ Zt Input part adjusts how much the obtained input value it is reflected by the Zt activation function.

reference

[Basics of LSTM that cannot be heard now] (https://www.hellocybernetics.tech/entry/2017/05/06/182757)

LSTM (1) for time series forecasting (for beginners)