What kind of article?

For those who want to try this or that for deep learning modeling but don't know how to implement it Using keras's functional API as a framework that is relatively flexible and reasonably abstracted Try to implement seq2seq, which is difficult with sequential, as simply as possible

Overview
Pre-processing
Model Construction & Learning
Reasoning (Imakoko)
Model improvement (not made yet)

Motivation for this article

I was able to learn all the way, but the data flow at the time of learning and the data flow at the time of inference are a little different. How can I make inferences using the parameters obtained from learning? I will answer the question.

What you need for reasoning

First, let's load the model that has been trained and saved once. Since the data flow is different during inference, it is necessary to define a model with a different calculation graph than during training.

Also, in order to realize the process of predicting the next word from the previous word. Use the defined model like a function in a loop to infer sequentially.

Inference implementation

Model loading

Models saved in h5 files etc. can be loaded as follows.

model = keras.models.load_model(filepath)

In addition, pickle seems to be deprecated.

Calculation graph definition

We will build the model shown in the following figure

encoder

The same encoder as at the time of learning can be used as it is.

#define encoder
encoder_model = Model(inputs=model.input[0], #encoder_input
                      outputs=model.get_layer('lstm_1').output[1:]) #enconder lstm hidden state

If it can be used as it is, it is possible to extract the output in the middle of Model in this way.

Decoder

The decoder has a little longer code. There are three things to do with the decoder: Embedding (for teacher forcing), LSTM, and Dense of the previous word. Embedding, LSTM, and Dense each have weights during learning, so use those values. Also, the memory of the hidden layer that should be input to LSTM is not always the same as the encoder output, but the memory after inferring the previous word, so it needs to be rewritten from the time of learning. Implementation example is as follows

from keras.layers import Input, LSTM, Dense, Embedding
#define decoder
embedding_dim = 256
units = 1024
vocab_tar_size = model.get_layer('dense_1').weights[1].shape.as_list()[0]

decoder_word_input = Input(shape=(1,),name='decoder_input')
decoder_input_embedding = Embedding(input_dim=vocab_tar_size, 
                                    output_dim=embedding_dim,
                                    weights=model.get_layer('embedding_2').get_weights())(decoder_word_input)


decoder_state_input_h = Input(shape=(units,), name='decoder_input_h')
decoder_state_input_c = Input(shape=(units,), name='decoder_input_c')
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

decoder_lstm = LSTM(units, 
                    return_sequences=False, 
                    return_state=True,
                    weights=model.get_layer('lstm_2').get_weights())
decoder_output, state_h, state_c = decoder_lstm(decoder_input_embedding,
                                                initial_state=decoder_states_inputs)

decoder_states = [state_h, state_c]

decoder_dense = Dense(vocab_tar_size, 
                      activation='softmax',
                      weights=model.get_layer('dense_1').get_weights())
decoder_output = decoder_dense(decoder_output)

decoder_model = Model(inputs=[decoder_word_input] + decoder_states_inputs,
                      outputs=[decoder_output] + decoder_states)

What is different from learning

The weights option is set for each layer. The value to be set can be obtained with model.get_layer (<layer name>). Get_weights ().
Shape of Input` is 1.
Put the return_sequences = True option of LSTM to get the LSTM output for each step
The memory of the hidden layer of LSTM is newly added as an Input layer.
Hidden layer memory has been added as the output of the Model class instance decoder_model

Check the generated model

SVG(model_to_dot(decoder_model).create(prog='dot', format='svg'))

Definition of the function to translate

Convert input word ID to output word ID

Actually enter the word ID column to get the translated word ID. To do

Encoding to hidden layer memory by encoder
Prediction of the first word using the memory obtained from the encoder and the start token
Prediction of the next word using the memory of the previous word and the previous hidden layer
Output of prediction result

is. The implementation example is as follows. (I wrote it so that it can be batch processed, but it is not essential)

def decode_sequence(input_seq, targ_lang, max_length_targ):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)
    vocab_tar_size = np.array(list(targ_lang.index_word.keys())).max()
    inp_batch_size = len(input_seq)
    # Generate empty target sequence of length 1.
    target_seq = np.zeros((inp_batch_size, 1))
    # Populate the first character of target sequence with the start character.
    target_seq[:, 0] = targ_lang.word_index['<start>']
    
    # Sampling loop for a batch of sequences
    decoded_sentence = np.zeros((inp_batch_size, max_length_targ))
    
    for i in range(max_length_targ):
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens,axis=1) #array of size [inp_batch_size, 1]

        decoded_sentence[:,i] = sampled_token_index

        # Update the target sequence (of length 1).
        target_seq = np.zeros((inp_batch_size, 1))
        target_seq[:, 0] = sampled_token_index

        # Update states
        states_value = [h, c]

    return decoded_sentence

Instances of the Model class have a predict method. If you pass the input to the predict method, the calculation will be performed according to the defined calculation graph and the output will be obtained.

First, we use ʻencoder_model.predict` to encode the input into hidden layer memory.

Assuming that target_seq, whose size is[batch_size, 1], is the previous word, use decoder_model.predict along with the memory of the hidden layer to remember the next word and the hidden layer to be input next. I'm getting.

The resulting words take ʻargmaxin sequence and take Store it indecoded_sentence so that the output will be the size [batch_size, max_length_targ] `.

Execute this loop as many times as the maximum length of the output word string, and output decoded_sentence.

Output example

array([[  15.,   33.,    5.,   27.,  121.,    9.,  482.,    6.,    8.,
           4.,    3.,    0.,    0.,    0.,    0.,    0.,    0.,    0.]])

Convert output word ID to word

Since the word ID and word conversion rule are obtained in advance by keras.preprocessing.text.Tokenizer All you have to do is to apply the conversion law of each component of ndarray. To make a python function work on all components of ndarray, you can write it without loops using np.vectorize

Implementation example is as follows

#decoded_sentense word_Convert index to words and remove start / end tokens
def seq2sentence(seq,lang):
    def index2lang(idx, lang):
        try:
            return lang.index_word[idx]
        except KeyError:
            return ''
    langseq2sentence = np.vectorize(lambda x: index2lang(x,lang),otypes=[str])
    sentences = langseq2sentence(seq)
    sentences = [' '.join(list(sentence)) for sentence in sentences]
    sentences = [sentence.lstrip('<start>').strip(' ').strip('<end>') for sentence in sentences]
    return sentences

I put exception handling for the time being. Finally, remove the wasted space and start / end tokens to complete.

reference

The pretreatment part is as follows Neural machine translation with attention https://www.tensorflow.org/tutorials/text/nmt_with_attention

The code base for the learning / inference part is as follows Sequence to sequence example in Keras (character-level). https://keras.io/examples/lstm_seq2seq/

The data used for learning is as follows https://github.com/odashi/small_parallel_enja

Repository containing the code for this article https://github.com/nagiton/simple_NMT

Building a seq2seq model using keras' Functional API Inference