I checked the output specifications of PyTorch's Bidirectional LSTM

Introduction

When declaring an LSTM when dealing with Bidirectional LSTM in PyTorch, as in the LSTM Reference (https://pytorch.org/docs/stable/nn.html?highlight=lstm#torch.nn.LSTM) It's OK just to specify bidirectional = True for, and it's very easy to handle (Keras is OK just to surround the LSTM with Bidrectional). However, looking at the reference, I don't think there is much mention of the output of making the LSTM Bidirectional. Even if I googled it, I couldn't understand the output specifications of Bidirectional LSTM in PyTorch, so I will summarize it here.

reference

  1. Bidirectional recurrent neural networks
  2. Understanding Bidirectional RNN in PyTorch

Specification confirmation

As you can see from References 1 and 2, you can see that bidirectional RNNs and LSTMs are as simple as the frontal and backward RNNs and LSTMs overlapping.

I will actually use it for the time being.

import torch
import torch.nn as nn

#5 embedded dimensions for each series
#The size of the hidden layer of the LSTM layer is 6
# batch_first=True for input format(batch_size, vocab_size, embedding_dim)I'm doing
# bidrectional=Declare bidirectional LSTM with True
bilstm = nn.LSTM(5, 6, batch_first=True, bidirectional=True)

#Batch size 1
#The length of the series is 4
#The number of embedded dimensions of each series is 5
#Generate a tensor like
a = torch.rand(1, 4, 5)
print(a)
#tensor([[[0.1360, 0.4574, 0.4842, 0.6409, 0.1980],
#         [0.0364, 0.4133, 0.0836, 0.2871, 0.3542],
#         [0.7796, 0.7209, 0.1754, 0.0147, 0.6572],
#         [0.1504, 0.1003, 0.6787, 0.1602, 0.6571]]])

#Like a normal LSTM, it has two outputs, so it receives both.
out, hc = bilstm(a)

print(out)
#tensor([[[-0.0611,  0.0054, -0.0828,  0.0416, -0.0570, -0.1117,  0.0902, -0.0747, -0.0215, -0.1434, -0.2318,  0.0783],
#         [-0.1194, -0.0127, -0.2058,  0.1152, -0.1627, -0.2206,  0.0747, -0.0210,  0.0307, -0.0708, -0.2458,  0.1627],
#         [-0.0163, -0.0568, -0.0266,  0.0878, -0.1461, -0.1745,  0.1097, 0.0230,  0.0353, -0.0739, -0.2186,  0.0818],
#         [-0.1145, -0.0460, -0.0732,  0.0950, -0.1765, -0.2599,  0.0063, 0.0143,  0.0124,  0.0089, -0.1188,  0.0996]]],
#       grad_fn=<TransposeBackward0>)
print(hc)
#(tensor([[[-0.1145, -0.0460, -0.0732,  0.0950, -0.1765, -0.2599]],
#        [[ 0.0902, -0.0747, -0.0215, -0.1434, -0.2318,  0.0783]]],
#       grad_fn=<StackBackward>), 
#tensor([[[-0.2424, -0.1340, -0.1559,  0.3499, -0.3792, -0.5514]],
#        [[ 0.1876, -0.1413, -0.0384, -0.2345, -0.4982,  0.1573]]],
#       grad_fn=<StackBackward>))

Like a normal LSTM, there are two outputs, ʻout and hc, and hc returns hc = (h, c) `in tuple format like a normal LSTM. I think there are two differences from the output of a normal LSTM.

――The dimension of each element of ʻoutis not the size of the dimension of the hidden layer of LSTM (6 this time), but double the size (12 this time). --Two elementsh and cofhc` are returned.

The following is a brief explanation of what these mean.

(C is omitted. I wrote the Embedding layer, but the Embedding layer is not done by LSTM.)

image.png

image.png

As you can see from the figure above, each element of ʻoutconnects each hidden layer vector in the forward and backward directions. (So the dimensions of each element are double the normal.) Also,h in hc = (h, c) `returns the last hidden layer vector in each of the forward and backward directions.

In other words

--The first half of the last element of ʻoutmatchesh [0]whenhc = (h, c) --The last half of the first element of ʻout matches h [1] when hc = (h, c)

Will be. You can read it from the source code output of the sample above, which means that.

print(out[:,-1][:,:6]) #The first half of the last element of out
print(hc[0][0])        #Last hidden layer value of forward LSTM
#tensor([[-0.1145, -0.0460, -0.0732,  0.0950, -0.1765, -0.2599]], grad_fn=<SliceBackward>)
#tensor([[-0.1145, -0.0460, -0.0732,  0.0950, -0.1765, -0.2599]], grad_fn=<SelectBackward>)

print(out[:,0][:,6:]) #The back half of the first element of out
print(hc[0][1])       #Backward LSTM last hidden layer value
#tensor([[ 0.0902, -0.0747, -0.0215, -0.1434, -0.2318,  0.0783]], grad_fn=<SliceBackward>)
#tensor([[ 0.0902, -0.0747, -0.0215, -0.1434, -0.2318,  0.0783]], grad_fn=<SelectBackward>)

Once you know the output specifications, you can cook as you like, When making a Many to One model such as sentence classification into a Bidirectional LSTM, there seem to be various methods such as combining the second return value of the LSTM, averaging, and taking the element product. In the case of Keras, it seems that it will be combined on the Keras side (by default), but in the case of PyTorch, it seems that you need to implement these processes yourself. For example, if I post Sentence classification by LSTM as Bidirectional LSTM, it will look like the following.

class LSTMClassifier(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size, batch_size=100):
        super(LSTMClassifier, self).__init__()
        self.batch_size = batch_size
        self.hidden_dim = hidden_dim
        self.word_embeddings = nn.Embedding(vocab_size, embedding_dim, padding_idx=0)
        self.bilstm = nn.LSTM(embedding_dim, hidden_dim, batch_first=True, bidirectional=True)
        #It receives a combination of the last hidden layer vectors in the forward and backward directions, so it is hidden._Double dim
        self.hidden2tag = nn.Linear(hidden_dim * 2, tagset_size)
        self.softmax = nn.LogSoftmax()

    def forward(self, sentence):
        embeds = self.word_embeddings(sentence)
        _, bilstm_hc = self.bilstm(embeds)
        # bilstm_out[0][0]->Last hidden layer vector of forward LSTM
        # bilstm_out[0][1]->Backward LSTM last hidden layer vector
        bilstm_out = torch.cat([bilstm_hc[0][0], bilstm_hc[0][1]], dim=1)
        tag_space = self.hidden2tag(bilstm_out)
        tag_scores = self.softmax(tag_space.squeeze())
        return tag_scores

in conclusion

――It may be a story that seems to be easy to understand in the world, but even for a moment when dealing with Bidirectional LSTM with PyTorch like yourself? I hope this article will help those who think that it is time to look it up. ――By the way, GRU becomes Bidirectional GRU with bidirectional = True like LSTM. As for the output format, if you know the above LSTM specifications, there should be no problem.

end

Recommended Posts

I checked the output specifications of PyTorch's Bidirectional LSTM
I checked the contents of docker volume
I checked the list of shortcut keys of Jupyter
I checked the session retention period of django
I checked the processing speed of numpy one-dimensionalization
I checked out the versions of Blender and Python
I checked the default OS and shell of docker-machine
Filter the output of tracemalloc
Keras I want to get the output of any layer !!
I investigated the mechanism of flask-login!
I want to output the beginning of the next month with Python
I checked the usage status of the parking lot from satellite images.
I checked the image of Science University on Twitter with Word2Vec.
I checked the gift tax amount
I checked the number of closed and opened stores nationwide by Corona
I want to output to the console coolly
I tried the asynchronous server of Django 3.0
I didn't know the basics of Python
The Python project template I think of.
The story of deciphering Keras' LSTM model.predict
I read the implementation of golang channel
Output the number of CPU cores in Python
I tried the pivot table function of pandas
I checked the library for using the Gracenote API
I tried cluster analysis of the weather map
I read the implementation of range (Objects / rangeobject.c)
Setting to output the log of cron execution
I solved the deepest problem of Hiroshi Yuki.
I tried to touch the API of ebay
I tried to correct the keystone of the image
Read the output of subprocess.Popen in real time
Try the free version of Progate [Python I]
I touched some of the new features of Python 3.8 ①
Output in the form of a python array
I read and implemented the Variants of UKR
I want to customize the appearance of zabbix
I checked the calendar deleted in Qiita Advent Calendar 2016
I tried using the image filter of OpenCV
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
[SLAYER] I visualized the lyrics of thrash metal and checked the soul of steel [Word Cloud]