Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 12 Memo "Convolutional Neural Networks"

Contents

This is a memo for myself as I read Introduction to Natural Language Processing Applications in 15 Steps. This time, in Chapter 3, Step 12, make a note of your own points. I've studied CNN itself, so the content is rough.

Preparation

--Personal MacPC: MacOS Mojave version 10.14.6 --docker version: Version 19.03.2 for both Client and Server

Chapter overview

In the previous chapter, by introducing word embeddings, it became possible to handle distributed expressions of words as features. However, in order to tailor it to sentence-level features, it is necessary to take the sum or average of the distributed expressions, which makes the prediction inferior to the BoW system. In this chapter, we will build a convolutional neural network (CNN) with a sequence of distributed expressions of words arranged in a form corresponding to a sentence as input. Note that the CNN used for familiar image analysis is two-dimensional, so the CNN when dealing with natural language processing (text classification, etc.) is one-dimensional.

12.1 ~ 12.4

CNN layer Contents
Convolutional layer ·input
· Distributed representation sequence of words obtained by word embeddings
-When stacking CNN layers, the output string of the Pooling layer of the previous layer

・ Align the length of the distributed expression sequence of each word that composes a sentence
・ Ignore the excess length
・ Fill in the missing parts with zero vectors

-Kernel for the direction of sentence structure_For each size, multiply it by the weight and add a bias to make it one of the outputs.
-Repeat the same operation for each stride in the direction of sentence structure, but use the same weight as the previous layer:weight sharing
Pooling layer ·input
-Output string of Convolutional layer

-There are Max pooling and Average pooling, but Max pooling, which is a non-linear process, has higher performance.

-The same operation can be repeated for each stride in the direction of sentence structure, but there is also a method of processing all at once without setting the stride;global max pooling, global average pooling
fully-connected layer
(densely-connected layer;Fully connected layer)
・ I want to input to the multi-layer perceptron for multi-class classification
-Since the output of the pooling layer is a two-dimensional array, convert it to a one-dimensional array that can be input to the multi-layer perceptron.

12.5 Implementation of CNN with Keras

The feature is the average of the distributed representation of word embedding (identifier: SVC)

In the previous chapter, the distributed expressions were totaled, so I tried to average them.

import numpy as np
from gensim.models import Word2Vec
from sklearn.svm import SVC
from tokenizer import tokenize
from sklearn.pipeline import Pipeline

class DialogueAgent:
    def __init__(self):
        self.model = Word2Vec.load(
            './latest-ja-word2vec-gensim-model/word2vec.gensim.model')  # <1>

    def train(self, texts, labels):
        pipeline = Pipeline([
            ('classifier', SVC()),
        ])
        pipeline.fit(texts, labels)
        self.pipeline = pipeline

    def predict(self, texts):
        return self.pipeline.predict(texts)

    #The content is almost the same as that of Step 11
    def calc_text_feature(self, text):
~~
#        return np.sum(word_vectors, axis=0)
        return np.average(word_vectors, axis=0)

evaluate_dialogue_agent.py


from os.path import dirname, join, normpath

import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score

from <Implemented module name> import DialogueAgent

if __name__ == '__main__':
    BASE_DIR = normpath(dirname(__file__))

    # Training
    training_data = pd.read_csv(join(BASE_DIR, './training_data.csv'))

    dialogue_agent = DialogueAgent()
    X_train = np.array([dialogue_agent.calc_text_feature(text) for text in training_data['text']])
    y_train = np.array(training_data['label'])
    dialogue_agent.train(X_train, y_train)

    # Evaluation
    test_data = pd.read_csv(join(BASE_DIR, './test_data.csv'))
    X_test = np.array([dialogue_agent.calc_text_feature(text) for text in test_data['text']])
    y_test = np.array(test_data['label'])

    y_pred = dialogue_agent.predict(X_test)

    print(accuracy_score(y_test, y_pred))

The feature quantity is the sum / average of the distributed representation of word embedding (identifier: NN).

In the above, the classifier was SVC, so let's change it to NN. Since the distributed representation of each word is averaged, the dimension of the distributed representation texts.shape [1] is set as the input_dim of the Keras Classifier.

~~
    def _build_mlp(self, input_dim, hidden_units, output_dim):
        mlp = Sequential()
        mlp.add(Dense(units=hidden_units,
                      input_dim=input_dim,
                      activation='relu'))
        mlp.add(Dense(units=output_dim, activation='softmax'))
        mlp.compile(loss='categorical_crossentropy',
                    optimizer='adam')

        return mlp

    def train(self, texts, labels, hidden_units = 32, classifier__epochs = 100):
        feature_dim = texts.shape[1]
        print(feature_dim)
        n_labels = max(labels) + 1

        classifier = KerasClassifier(build_fn=self._build_mlp,
                                     input_dim=feature_dim,
                                     hidden_units=hidden_units,
                                     output_dim=n_labels)

        pipeline = Pipeline([
            ('classifier', classifier),
        ])

        pipeline.fit(texts, labels, classifier__epochs=classifier__epochs)

        self.pipeline = pipeline

    def predict(self, texts):
        return self.pipeline.predict(texts)
~~

The distributed expression of word embedding is used as the feature (input layer: Embedding-> Flatten-> Dense).

Since the distributed representation of word embedding is a two-dimensional array, input it to the Dense layer after inserting the Flatten layer.

    #Model building
    model = Sequential()
    model.add(get_keras_embedding(we_model.wv,
                                  input_shape=(MAX_SEQUENCE_LENGTH, ),
                                  trainable=False))

    model.add(Flatten())
    model.add(Dense(units=256, activation='relu'))
    model.add(Dense(units=128, activation='relu'))
    model.add(Dense(units=n_classes, activation='softmax'))
    model.compile(loss='categorical_crossentropy',
                  optimizer='rmsprop',
                  metrics=['accuracy'])

The distributed expression of word embedding is used as the feature (input layer: Embedding-> CNN (Dense)).

Input the distributed representation of word embedding to the Convolutional layer of CNN, but set kernel_size to the dimension x_train.shape [1] of the distributed representation and make it the same configuration as the Dense layer.

    #Model building
    model = Sequential()
    model.add(get_keras_embedding(we_model.wv,
                                  input_shape=(MAX_SEQUENCE_LENGTH, ),
                                  trainable=False))  # <6>

    # 1D Convolution
    model.add(Conv1D(filters=256, kernel_size=x_train.shape[1], strides=1, activation='relu'))
    # Global max pooling
    model.add(MaxPooling1D(pool_size=int(model.output.shape[1])))
    model.add(Flatten())
    model.add(Dense(units=128, activation='relu'))
    model.add(Dense(units=n_classes, activation='softmax'))
    model.compile(loss='categorical_crossentropy',
                  optimizer='rmsprop',
                  metrics=['accuracy'])

The distributed expression of word embedding is used as the feature (input layer: Embedding-> CNN).

Enter the distributed representation of word embedding into the CNN Convolutional layer. Details are omitted because it is as in the book.

Embedding layer


    Embedding(input_dim=word_num + 1,
             output_dim=embedding_dim,
             weights=[weights_with_zero],
             *args, **kwargs)
↓
# *args, **kwargs actually looks like this
    Embedding(input_dim=word_num + 1,
             output_dim=embedding_dim,
             weights=[weights_with_zero],
             input_shape=(MAX_SEQUENCE_LENGTH, ),
             trainable=False)

--Trainable: Weights are not updated during learning (Embedding is performed using already learned weights, so weights cannot be updated by learning) --input_shape: When adding a layer with add in Keras, specify it as the first input layer --input_dim / output_dim: In / out dim of Embedding layer weights. The output dimension of the Embedding layer is equal to output_dim

Execution result

Distributed representation of word embeddings Identifyer Execution result
total SVC 0.40425531914893614
average SVC 0.425531914893617
total/average NN 0.5638297872340425 /
0.5531914893617021
Remain in line Embedding -> Flatten -> Dense 0.5319148936170213
Remain in line Embedding -> CNN(Dense) 0.5
Remain in line Embedding -> CNN 0.6808510638297872

--Normal implementation (Step01): 37.2% --Addition of preprocessing (Step02): 43.6% --Pre-processing + feature extraction change (Step04): 58.5% --Pretreatment + feature extraction change + classifier change RandomForest (Step06): 61.7% --Pre-processing + feature extraction change + classifier change NN (Step09): 66.0% --Pretreatment + feature extraction change (Step 11): 40.4% --Pre-processing + feature extraction change + classifier change CNN (Step12): 68.1%

Recommended Posts

Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 12 Memo "Convolutional Neural Networks"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 08 Memo "Introduction to Neural Networks"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 13 Memo "Recurrent Neural Networks"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 06 Memo "Identifier"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 02 Memo "Pre-processing"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 07 Memo "Evaluation"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 10 Memo "Details and Improvements of Neural Networks"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 09 Memo "Identifier by Neural Network"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 4 Step 14 Memo "Hyperparameter Search"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 4 Step 15 Memo "Data Collection"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 05 Memo "Features Conversion"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 11 Memo "Word Embeddings"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 01 Memo "Let's Make a Dialogue Agent"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 03 Memo "Morphological Analysis and Word Separation"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 1 Memo "Preliminary Knowledge Before Beginning Exercises"
Model using convolutional neural network in natural language processing
[WIP] Pre-processing memo in natural language processing
Summary from the beginning to Chapter 1 of the introduction to design patterns learned in the Java language
[Chapter 6] Introduction to scikit-learn with 100 knocks of language processing
[Chapter 3] Introduction to Python with 100 knocks of language processing
[Chapter 2] Introduction to Python with 100 knocks of language processing
[Chapter 4] Introduction to Python with 100 knocks of language processing
[Job change meeting] Try to classify companies by processing word-of-mouth in natural language with word2vec
[Natural language processing] I tried to visualize the hot topics this week in the Slack community
[Natural language processing] I tried to visualize the remarks of each member in the Slack community
[Python] Try to classify ramen shops by natural language processing
Summary of Chapter 2 of Introduction to Design Patterns Learned in Java Language
Chapter 4 Summary of Introduction to Design Patterns Learned in Java Language
Summary of Chapter 3 of Introduction to Design Patterns Learned in Java Language
[Introduction to RasPi4] Environment construction; natural language processing system mecab, etc. .. .. ♪
Dockerfile with the necessary libraries for natural language processing in python
I tried to display the analysis result of the natural language processing library GiNZA in an easy-to-understand manner