What kind of article?

For those who want to try this or that for deep learning modeling but don't know how to implement it Using keras's functional API as a framework that is relatively flexible and reasonably abstracted Try to implement seq2seq, which is difficult with sequential, as simply as possible

Overview
Pre-processing
Model building & learning (Imakoko)
Inference
Model improvement (not made yet)

Motivation for this article

It turns out that deep learning can be implemented using keras. What kind of code should I write specifically? I will answer the question.

What you need to do in model building & learning

When using keras, it is convenient to utilize the Model class. https://keras.io/ja/models/model/

The Model class is responsible for defining learning methods, executing learning, and inferring in a model determined by learning.

In order to create a Model instance, it is necessary to create a calculation graph for the machine learning model in advance. There are two options for this: sequential API and functional API. The sequential API is very simple and is useful when the processing of the previous layer becomes the processing of the next layer as it is. Instead, it sacrifices model flexibility and cannot be used with increasing complexity, such as multi-input, multi-output models. Compared to the sequential API, the functional API requires you to define the connection between layers by yourself, but you can write it more flexibly. This time, we will create a model using the functional API.

Once you've built a computational graph and created a Model instance, the rest is easy Definition of learning method with compile method of Model instance (optimization method, loss function setting, etc.) You can execute learning with the fit method.

Model building & learning implementation

Calculation graph definition

We will build the model shown in the following figure

encoder

There are two things to do with the encoder: embedding the input and inputting to the LSTM. Implementation example is as follows

from keras.layers import Input, LSTM, Dense, Embedding
# Define an input sequence and process it.
encoder_inputs = Input(shape=(max_length_inp,),name='encoder_input')
encoder_inputs_embedding = Embedding(input_dim=vocab_inp_size, output_dim=embedding_dim)(encoder_inputs)
encoder = LSTM(units, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs_embedding)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]

Input to the model is always done from the ʻInput layer. Here, from ʻInput, the maximum length max_length_inp dimension data of the input character string is input at once. RNN-based algorithms process input data strings one by one and pass them to the next step in sequence, but it is also possible to abbreviate them in this way.

encoder_inputs_embedding = Embedding(input_dim=vocab_inp_size, output_dim=embedding_dim)(encoder_inputs)

Means "Define the ʻEmbedding layer with ʻinput_dim = vocal_inp_size, output_dim = embedding_dim" "Add a calculated graph so that the result of substituting ʻencoder_inputs for the defined ʻEmbedding is ʻencoder_inputs_embedding`" It means that.

encoder = LSTM(units, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs_embedding)

You can also define the layer and add it to the calculation graph in a separate line, as in.

Decoder

There are three things to do with the decoder: Embedding (for teacher forcing), LSTM, and Dense for the decoder input. Implementation example is as follows

from keras.layers import Input, LSTM, Dense, Embedding
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(max_length_targ-1,),name='decoder_input')
decoder_inputs_embedding  = Embedding(input_dim=vocab_tar_size, output_dim=embedding_dim)(decoder_inputs)
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(units, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs_embedding,
                                     initial_state=encoder_states)
decoder_dense = Dense(vocab_tar_size, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

What is different from the encoder

ʻInput has one less shape`.
Put the return_sequences = True option of LSTM to get the LSTM output for each step
LSTM receives LSTM hidden layer memory ʻencoder_states` obtained from the encoder
There is a Dense layer. Since it is an output layer, ʻactivation is softmax`

Create Model instance

If you come to this point, the rest is straight forward

from keras.models import Model
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

Check the generated model

from IPython.display import SVG
SVG(model_to_dot(model).create(prog='dot', format='svg'))

You can visualize the calculated graph that represents the model with.

The positional relationship on the drawing of each layer is different from the figure shown at the beginning, but you can see that it is the same as a network.

Also,

model.summary()

You can check the number of parameters for each layer with.


__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
encoder_input (InputLayer)      (None, 18)           0                                            
__________________________________________________________________________________________________
decoder_input (InputLayer)      (None, 17)           0                                            
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, 18, 256)      1699328     encoder_input[0][0]              
__________________________________________________________________________________________________
embedding_2 (Embedding)         (None, 17, 256)      2247168     decoder_input[0][0]              
__________________________________________________________________________________________________
lstm_1 (LSTM)                   [(None, 1024), (None 5246976     embedding_1[0][0]                
__________________________________________________________________________________________________
lstm_2 (LSTM)                   [(None, 17, 1024), ( 5246976     embedding_2[0][0]                
                                                                 lstm_1[0][1]                     
                                                                 lstm_1[0][2]                     
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 17, 8778)     8997450     lstm_2[0][0]                     
==================================================================================================
Total params: 23,437,898
Trainable params: 23,437,898
Non-trainable params: 0
__________________________________________________________________________________________________

During actual modeling, it is recommended to visualize it as appropriate because debugging will progress.

Setting learning conditions and executing learning

I would like to use Adam to optimize the loss function as cross entropy. Let's look at the word accuracy for each epoch.

I want to save the model every 5 epochs.

Implementation example is as follows

model.compile(optimizer='adam', loss='categorical_crossentropy',
              metrics=['accuracy'])

# define save condition
dir_path = 'saved_models/LSTM/'
save_every = 5
train_schedule = [save_every for i in range(divmod(epochs,save_every)[0])]
if divmod(epochs,save_every)[1] != 0:
    train_schedule += [divmod(epochs,save_every)[1]]
    
#run training
total_epochs = 0
for epoch in train_schedule:
    history = model.fit([encoder_input_tensor, decoder_input_tensor], 
                          np.apply_along_axis(lambda x: np_utils.to_categorical(x,num_classes=vocab_tar_size), 1, decoder_target_tensor),
                          batch_size=batch_size,
                          epochs=epoch,
                          validation_split=0.2)
    total_epochs += epoch
    filename = str(total_epochs) + 'epochs_LSTM.h5'
    model.save(dir_path+filename)

I'm doing various things, but the only ones are model.compile and model.fit. I think that only these two are enough for the minimum.

Pass the optimization method, loss function, and metric for evaluation to model.compile as options. Then, learning will be executed with model.fit. The most important parameters given to model.fit are the input data and the correct answer data. The correct answer data is np.apply_along_axis (lambda x: np_utils.to_categorical (x, num_classes = vocal_tar_size), 1, decoder_target_tensor) This is because I want to convert each element of decoder_target_tensor to a one-hot encoded format.

Tips for coding and debugging

Bugs can be found quickly by making appropriate visualizations to check the consistency of dimensions, or by substituting specific values as appropriate. Since each layer can be treated like a function, you can get the output of the concrete value by substituting the concrete value.

reference

The pretreatment part is as follows Neural machine translation with attention https://www.tensorflow.org/tutorials/text/nmt_with_attention

The code base for the learning / inference part is as follows Sequence to sequence example in Keras (character-level). https://keras.io/examples/lstm_seq2seq/

The data used for learning is as follows https://github.com/odashi/small_parallel_enja

Repository containing the code for this article https://github.com/nagiton/simple_NMT

Build a seq2seq model using keras's Functional API Model building & learning