Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 09 Memo "Identifier by Neural Network"

Contents

This is a memo for myself as I read Introduction to Natural Language Processing Applications in 15 Steps. This time, in Chapter 3, Step 09, I will write down my own points.

Preparation

--Personal MacPC: MacOS Mojave version 10.14.6 --docker version: Version 19.03.2 for both Client and Server

Chapter overview

Let's implement a multi-class classifier using the multi-layer perceptron discussed in the previous chapter.

--softmax: Activation function for multi-class identification ⇄ sigmoid (for 2-class identification) --categorical_crossentropy: Loss function when multi-class identification ⇄ binary_crossentropy (when 2-class identification)

09.1 Multilayer perceptron as a multi-class classifier

The number of units in the output layer is different between the two-class classifier and the multi-class classifier, and the teacher label is given differently.

--Two-class classifier: A unit with a one-dimensional output layer. Output identification class with 0 or 1 --Expression by class ID - 0, -> Class ID is 0 - 1, -> Class ID is 1 - 2, -> Class ID is 2 --Multi-class classifier: A unit whose output layer is as many as the number of classes. Output the identification class with 1 only for the unit corresponding to the class ID and 0 for the others --One-hot expression - [1, 0, 0], -> Class ID is 0 - [0, 1, 0], -> Class ID is 1 - [0, 0, 1], -> Class ID is 2

Activation function for multi-class identification

Softmax is often used.

--The output fits between 0 and 1 --The sum of all outputs of the applied layer is 1 --** The difference between the large value and the small value of the output value of each unit of the applied layer opens **

By passing it through softmax, the value with the difference in magnitude is settled between 0 and 1, and then it approaches 0 or 1 so that the ratio of magnitude becomes larger.

--Useful for multi-class identification as it is easy to limit a unit with a large value to one --The identification result can be treated as a probability

Two-class classification and multi-class classification

If there are log2N units, N-class classification is theoretically possible by combining those outputs 0 or 1 so that two-class classification can be performed with the output 0 or 1 of one unit. However, the lower unit has to learn the same 0 or 1 in more than one class, which seems to be intuitively unnatural and the learning does not go well.

Loss function during multi-class identification

In contrast to binary_crossentropy for two-class identification, categorical_crossentropy is used for multi-class identification.

Use the list of class IDs as teacher data

When classifying into N class, N neurons must be prepared in the output layer. At this time, the output label must be specified so that N neurons can be given a value of 0 or 1 instead of the class ID itself.

--Convert to one-hot expression with keras.util.to_categorical --Set the loss function to sparse_categorical_crossentropy to support non-one-hot expressions

09.2 Apply to Dialogue Agent

Implementation example

Mounting pattern point
basic #Setting
・ Set the number of input dimensions of the model separately
・ Set the number of output dimensions of the model separately
・ When learning
・ Teacher label one-Need to convert to hot representation
・ At the time of identification
・ One-Requires conversion from hot representation to class ID

#Run
・ When learning
・ Vectorizer fit_execute transform
・ Fit execution of classifier
・ At the time of identification
 ・vectorizerのexecute transform
・ Execute predict of classifier
Keras scikit-with learn API
sklearn.pipeline.Embedded in Pipeline
#Setting
・ Set the number of input dimensions of the model separately
・ Set the number of output dimensions of the model separately

#Run
・ When learning
 ・ Fit execution of vectorizer
・ Pipeline fit execution
・ At the time of identification
・ Execute predict of pipeline

In keras.wrappers.scikit_learn.KerasClassifier, fit () executes the process equivalent to to_categorical, and predict () executes the process equivalent to np.argmax. Also, by using pipeline, fit () and predict () of vectorizer and classifier can be executed together, but note that only fit () of vectorizer is required separately to specify the input dimension when setting the model.

Additions / changes from the previous chapter (Step 06)

  1. Output layer activation function: sigmoid → softmax
  2. Loss function: binary_crossentropy → categorical_crossentropy
  3. Identifyer: RandomForestClassifier → KerasClassifier
    def _build_mlp(self, input_dim, hidden_units, output_dim):
        mlp = Sequential()
        mlp.add(Dense(units=hidden_units,
                      input_dim=input_dim,
                      activation='relu'))
        mlp.add(Dense(units=output_dim, activation='softmax')) #1: Output layer activation function
        mlp.compile(loss='categorical_crossentropy', #2: Loss function
                    optimizer='adam')

        return mlp

    def train(self, texts, labels):
~~

        feature_dim = len(vectorizer.get_feature_names())
        n_labels = max(labels) + 1

        #3: Identifyer
        classifier = KerasClassifier(build_fn=self._build_mlp,
                                     input_dim=feature_dim,
                                     hidden_units=32,
                                     output_dim=n_labels)
~~

Execution result


# evaluate_dialogue_agent.Modify py loading module name as needed
from dialogue_agent_sklearn_pipeline import DialogueAgent

$ docker run -it -v $(pwd):/usr/src/app/ 15step:latest python evaluate_dialogue_agent.py
0.65957446

Normal implementation (Step 01): 37.2% Pre-processing added (Step02): 43.6% Preprocessing + feature extraction change (Step04): 58.5% Pretreatment + feature extraction change + classifier change (Step06): 61.7% Preprocessing + feature extraction change + classifier change (Step09): 66.0%

Application issues

Added hidden_units and classifier__epochs to the arguments of the train method of the DialogueAgent class.

dialogue_agent_sklearn_pipeline.py


    def train(self, texts, labels, hidden_units = 32, classifier__epochs = 100):
~~
        classifier = KerasClassifier(build_fn=self._build_mlp,
                                     input_dim=feature_dim,
                                     hidden_units=hidden_units,
                                     output_dim=n_labels)

~~
        pipeline.fit(texts, labels, classifier__epochs=classifier__epochs)
~~

Specify hidden_units and classifier__epochs when calling the train method of the DialogueAgent class.

evaluate_dialogue_agent.py


    HIDDEN_UNITS = 64
    CLASSIFIER_EPOCHS = 50

    # Training
    training_data = pd.read_csv(join(BASE_DIR, './training_data.csv'))

    dialogue_agent = DialogueAgent()
    dialogue_agent.train(training_data['text'], training_data['label'], HIDDEN_UNITS, CLASSIFIER_EPOCHS)

Execution result


Epoch 50/50
917/917 [==============================] - 0s 288us/step - loss: 0.0229

###I also took a look at various things###
# pprint.pprint(dialogue_agent.pipeline.steps)
[('vectorizer',
  TfidfVectorizer(analyzer='word', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=1.0, max_features=None, min_df=1,
        ngram_range=(1, 2), norm='l2', preprocessor=None, smooth_idf=True,
        stop_words=None, strip_accents=None, sublinear_tf=False,
        token_pattern='(?u)\\b\\w\\w+\\b',
        tokenizer=<bound method DialogueAgent._tokenize of <dialogue_agent_sklearn_pipeline.DialogueAgent object at 0x7f7fc81bd128>>,
        use_idf=True, vocabulary=None)),
 ('classifier',
  <keras.wrappers.scikit_learn.KerasClassifier object at 0x7f7fa4a6a320>)]

# pprint.pprint(dialogue_agent.pipeline.steps[1][1].get_params())
{'build_fn': <bound method DialogueAgent._build_mlp of <dialogue_agent_sklearn_pipeline.DialogueAgent object at 0x7f7fc81bd128>>,
 'hidden_units': 64,
 'input_dim': 3219,
 'output_dim': 49}

# print([len(v) for v in dialogue_agent.pipeline.steps[1][1].model.layers[0].get_weights()])
[3219, 64]

# print([len(v) for v in dialogue_agent.pipeline.steps[1][1].model.layers[1].get_weights()])
[64, 49]

It can be confirmed that the input layer dimension is 3219, the hidden layer dimension is 64, and the output layer dimension is 49. It was confirmed that it was correct from the format of the weight list of the 0th layer and the 1st layer. (As learning progresses, this list of weights will be updated more and more)

Recommended Posts

Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 09 Memo "Identifier by Neural Network"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 06 Memo "Identifier"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 08 Memo "Introduction to Neural Networks"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 12 Memo "Convolutional Neural Networks"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 13 Memo "Recurrent Neural Networks"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 02 Memo "Pre-processing"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 07 Memo "Evaluation"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 4 Step 14 Memo "Hyperparameter Search"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 04 Memo "Feature Extraction"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 4 Step 15 Memo "Data Collection"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 05 Memo "Features Conversion"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 11 Memo "Word Embeddings"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 10 Memo "Details and Improvements of Neural Networks"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 01 Memo "Let's Make a Dialogue Agent"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 2 Step 03 Memo "Morphological Analysis and Word Separation"
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 1 Memo "Preliminary Knowledge Before Beginning Exercises"
Model using convolutional neural network in natural language processing
[Python] Try to classify ramen shops by natural language processing
[Job change meeting] Try to classify companies by processing word-of-mouth in natural language with word2vec
[Language processing 100 knocks 2020] Chapter 8: Neural network
[WIP] Pre-processing memo in natural language processing
Summary from the beginning to Chapter 1 of the introduction to design patterns learned in the Java language
[Natural language processing / NLP] How to easily perform back translation by machine translation in Python
[Chapter 5] Introduction to Python with 100 knocks of language processing
[Chapter 6] Introduction to scikit-learn with 100 knocks of language processing
[Chapter 3] Introduction to Python with 100 knocks of language processing
[Chapter 2] Introduction to Python with 100 knocks of language processing
[Chapter 4] Introduction to Python with 100 knocks of language processing
[Natural language processing] I tried to visualize the hot topics this week in the Slack community
[Natural language processing] I tried to visualize the remarks of each member in the Slack community
Natural language processing with Word2Vec developed by a researcher in the US google (original data)
[Deep Learning from scratch] About the layers required to implement backpropagation processing in a neural network