Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 13 Memo "Recurrent Neural Networks"

Contents

This is a memo for myself as I read Introduction to Natural Language Processing Applications in 15 Steps. This time, in Chapter 3, Step 13, make a note of your own points.

Preparation

--Personal MacPC: MacOS Mojave version 10.14.6 --docker version: Version 19.03.2 for both Client and Server

Chapter overview

In the previous chapter, we constructed a convolutional neural network (CNN) with a sequence of distributed expressions of words arranged in a form corresponding to a sentence as input. In this chapter, we will build a recurrent neural network (RNN) that uses a sequence of distributed expressions of words arranged in a form corresponding to a sentence as input. The detailed explanation of the mechanism is omitted.

13.1 Recurrent layer

Connect the previous output to the next output

Input the leftmost column of the feature vector to one layer (fully connected layer) of the multi-layer perceptron. The next neuron to be input is shifted to the right by one row and input to the fully connected layer in the same way, but the weight of the ** fully connected layer used here is the same as the one used before **. At the same time, ** connect the previous output neuron ** through another fully connected layer.

--CNN: Includes information for all columns of feature vectors by inputting a series of outputs into the max pooling layer to get the vector --RNN: Since the previous output is connected to the next output, the one vector obtained at the end contains the information of the entire column of feature vectors (however, the features at the beginning become smaller). Note that it will end up)

Another representation of RNN

It can also be explained by preparing a fully connected layer with a "connection that returns its output to itself" and inputting vectors to it in order. I originally had this image, and when I expanded the loop part, it became the image I mentioned earlier.

13.2 LSTM An abbreviation for long short-term memory, RNN had the problem that the characteristics at the beginning became smaller, but LSTM is an improvement that can retain old information. (I want to summarize LSTM in the future)

13.3 Implementation of RNN by Keras

Additions / changes from the previous chapter (Step 12)

--Neural network structure: CNN-> RNN --Handling of 0s in sequence: No special-> Special treatment as numbers for zero padding --The layer following the embedding layer must also be supported (LSTM is supported, CNN is not supported)

rnn_sample.py


    model = Sequential()
    model.add(get_keras_embedding(we_model.wv,
                                  input_shape=(MAX_SEQUENCE_LENGTH, ),
                                  mask_zero=True,
                                  trainable=False))
    model.add(LSTM(units=256))
    model.add(Dense(units=128, activation='relu'))
    model.add(Dense(units=n_classes, activation='softmax'))
    model.compile(loss='categorical_crossentropy',
                  optimizer='rmsprop',
                  metrics=['accuracy'])

Execution result


# CNN
$ docker run -it -v $(pwd):/usr/src/app/ 15step:latest python cnn_sample.py
Epoch 50/50
917/917 [==============================] - 0s 303us/step - loss: 0.0357 - acc: 0.9924
0.6808510638297872

Epoch 100/100
917/917 [==============================] - 0s 360us/step - loss: 0.0220 - acc: 0.9902
0.6808510638297872

# LSTM
$ docker run -it -v $(pwd):/usr/src/app/ 15step:latest python rnn_sample.py
Epoch 50/50
917/917 [==============================] - 4s 4ms/step - loss: 0.2530 - acc: 0.9378
0.6063829787234043

Epoch 100/100
917/917 [==============================] - 4s 4ms/step - loss: 0.0815 - acc: 0.9793
0.5851063829787234

# Bi-directional RNN
$ docker run -it -v $(pwd):/usr/src/app/ 15step:latest python bid_rnn_sample.py
Epoch 50/50
917/917 [==============================] - 2s 2ms/step - loss: 0.2107 - acc: 0.9487
0.5851063829787234

Epoch 100/100
917/917 [==============================] - 2s 2ms/step - loss: 0.0394 - acc: 0.9858
0.5851063829787234

# GRU
Epoch 50/50
917/917 [==============================] - 1s 1ms/step - loss: 0.2947 - acc: 0.9368
0.4787234042553192

Epoch 100/100
917/917 [==============================] - 1s 1ms/step - loss: 0.0323 - acc: 0.9869
0.5531914893617021

Compare with 50 Epoch numbers. Other than CNN, the loss function did not drop even with Epoch50, so I verified it with Epoch100.

Type of NN Execution result Execution speed
CNN Epoch50:68.1%
Epoch100:68.1%
Average 300us/step -> 0.27s/epoch
LSTM Epoch50:60.6%
Epoch100:58.5%
4ms on average/step -> 3.6s/epoch
Bi-directional RNN Epoch50:58.5%
Epoch100:58.5%
Average 2ms/step -> 1.8s/epoch
GRU Epoch50:47.9%
Epoch100:55.3%
Average 1ms/step -> 0.9s/epoch

Neural network tuning such as hyperparameter search in the following chapters is required, but CNN is fast and the discrimination rate is quite good.

--Normal implementation (Step01): 37.2% --Addition of preprocessing (Step02): 43.6% --Pre-processing + feature extraction change (Step04): 58.5% --Pretreatment + feature extraction change + classifier change RandomForest (Step06): 61.7% --Pre-processing + feature extraction change + classifier change NN (Step09): 66.0% --Pretreatment + feature extraction change (Step 11): 40.4% --Pre-processing + feature extraction change + classifier change CNN (Step12): 68.1% --Pretreatment + feature extraction change + classifier change RNN (Step13): 60.6%

13.4 Summary

Since a simple RNN that simply diverted a fully connected layer similar to a multi-layer perceptron does not work well, we introduced LSTM.

13.5 For more advanced learning

The contents of Chapter 3 of this book are elementary and focus on how to use them for practical applications. In order to gain a deeper understanding of the theory, we should solidify the theory of neural networks before we start. It might be a good idea to try the Kaggle competition.

Recommended Posts

Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 13 Memo "Recurrent Neural Networks"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 08 Memo "Introduction to Neural Networks"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 12 Memo "Convolutional Neural Networks"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 06 Memo "Identifier"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 02 Memo "Pre-processing"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 07 Memo "Evaluation"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 10 Memo "Details and Improvements of Neural Networks"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 09 Memo "Identifier by Neural Network"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 4 Step 14 Memo "Hyperparameter Search"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 04 Memo "Feature Extraction"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 4 Step 15 Memo "Data Collection"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 05 Memo "Features Conversion"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 11 Memo "Word Embeddings"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 01 Memo "Let's Make a Dialogue Agent"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 03 Memo "Morphological Analysis and Word Separation"
[WIP] Pre-processing memo in natural language processing
Recurrent Neural Networks: An Introduction to RNN
Summary from the beginning to Chapter 1 of the introduction to design patterns learned in the Java language
[Chapter 5] Introduction to Python with 100 knocks of language processing
Model using convolutional neural network in natural language processing
[Chapter 6] Introduction to scikit-learn with 100 knocks of language processing
[Chapter 3] Introduction to Python with 100 knocks of language processing
[Chapter 2] Introduction to Python with 100 knocks of language processing
[Chapter 4] Introduction to Python with 100 knocks of language processing
[Job change meeting] Try to classify companies by processing word-of-mouth in natural language with word2vec
[Natural language processing] I tried to visualize the remarks of each member in the Slack community
[Python] Try to classify ramen shops by natural language processing
Summary of Chapter 2 of Introduction to Design Patterns Learned in Java Language
Chapter 4 Summary of Introduction to Design Patterns Learned in Java Language
Summary of Chapter 3 of Introduction to Design Patterns Learned in Java Language
[Introduction to RasPi4] Environment construction; natural language processing system mecab, etc. .. .. ♪
Dockerfile with the necessary libraries for natural language processing in python
I tried to display the analysis result of the natural language processing library GiNZA in an easy-to-understand manner