For those who want to try this or that for deep learning modeling but don't know how to implement it Using keras's functional API as a framework that is relatively flexible and reasonably abstracted Try to implement seq2seq, which is difficult with sequential, as simply as possible
Build a machine translation model from English to Japanese. It is assumed that the character string is decomposed into words in advance. Specifically, the following data is used https://github.com/odashi/small_parallel_enja
First, convert the input word string to a word ID and embed it in an appropriate dimension. Next, use LSTM to create a variable-length vector sequence of embedded vectors corresponding to the input word sequence. Encode into the hidden states h and c of two LSTM cells, which are fixed-length feature vectors.
Enter the encoded hidden states h, c and a special word (embedded vector) to represent the beginning of the sentence in another LSTM. By passing the output of LSTM to the fully connected layer and taking softmax, the probability that the first word is each word is obtained. Taking argmax of the probability that the first word is each word gives the estimation result of the first word. After that, words are generated one after another from the embedded vector of the t-word and the hidden states h and c at the time of the t-word, and the output word string is obtained.
It looks like the figure below.
Squares of the same color have the same weight. The actual input of a square containing a specific word is the word converted to the corresponding word ID. For convenience, it is written in the word itself. As an example, I wrote the dimensions of the values passed between each cell in the code I wrote this time, but they are hyperparameters. There are many more models for seq2seq, not limited to neural machine translation, but we will focus on getting used to keras and use a simple model.
I use Sagemaker's notebook instance because it's a lot of trouble. Various libraries and GPU environment have already been built for GPU instances, and the price starts from 0.0464 USD / hour! If you use GPU, it costs 1.26 USD / hour for ml.p2.xlarge, so if you debug with ml.t2.medium and use ml.p2.xlarge only for learning, it will be cheaper. I have a gaming PC at hand, so I can do it at hand, but it's a hassle, so I've been using notebook instances all the time these days.
The pretreatment part is as follows Neural machine translation with attention https://www.tensorflow.org/tutorials/text/nmt_with_attention
The code base for the learning / inference part is as follows Sequence to sequence example in Keras (character-level). https://keras.io/examples/lstm_seq2seq/
The data used for learning is as follows https://github.com/odashi/small_parallel_enja
Repository containing the code for this article https://github.com/nagiton/simple_NMT
Recommended Posts