[Text classification] I implemented Convolutional Neural Networks for Sentence Classification with Chainer

Roughly speaking

--Text classification using Convolutional Neural Networks (CNN). -Convolutional Neural Networks for Sentence Classification was implemented in Chainer. -[Chainer] Document classification by convolutional neural network Text classification was possible with a higher accuracy rate.

Introduction

[Convolutional Neural Networks for Sentence Classification](http: / /emnlp2014.org/papers/pdf/EMNLP2014181.pdf) has been implemented in Chainer.

Author's GitHub also publishes an implementation using Theano.

The source code developed this time is available here: chainer-cnnsc

Data used

-Data can be found at here. I used "sentence polarity dataset v1.0". -Download directly

Advance preparation

--Installation of Chainer, scikit-learn, gensim --Download of the trained model of word2vec (GoogleNews-vectors-negative300.bin.gz).

environment

Training data

Use English text data. Please obtain the text data from the above download destination. Each line corresponds to one document. The first column is the label and the second and subsequent columns are the text. Labels 0 are negative documents and 1 are positive documents.

[label] [text(Half-width space delimiter)]
0 it just didn't mean much to me and played too skewed to ever get a hold on ( or be entertained by ) .
1 culkin , who's in virtually every scene , shines as a young man who uses sarcastic lies like a shield .
...

model

This time, I used the model proposed in this paper (Convolutional Neural Networks for Sentence Classification). You can find a description of the model in this article.

-Model using convolutional neural network in natural language processing

CNNMODEL

Program (network part)

In the program, multiple filter sizes for convolution are defined, and convolution is performed for each filter. The defined filter size is stored in filter_height in list format. For forward propagation, convolution is performed by turning a loop for each filter size as shown below.

 #Turn the loop for each filter type
 for i, filter_size in enumerate(self.filter_height):
     #Through the Convolution layer
     h_conv[i] = F.relu(self[i](x))
     #Through the Pooling layer
     h_pool[i] = F.max_pooling_2d(h_conv[i], (self.max_sentence_len+1-filter_size))

The source code of the network part is shown below.

#I want to make the number of links variable, so I use ChainList
class CNNSC(ChainList):
    def __init__(self,
                 input_channel,
                 output_channel,
                 filter_height,
                 filter_width,
                 n_label,
                 max_sentence_len):
        #The number of filters, the height of the filters used, and the maximum sentence length will be used later.
        self.cnv_num = len(filter_height)
        self.filter_height = filter_height
        self.max_sentence_len = max_sentence_len
        
        #Added Link for Convolution layer for each filter
        # Convolution2D(Number of input channels,Number of output channels (number of filters for each shape),Filter shape (in tuple format),Padding size)
        link_list = [L.Convolution2D(input_channel, output_channel, (i, filter_width), pad=0) for i in filter_height]
        #Added Link for Dropoff
        link_list += [L.Linear(output_channel * self.cnv_num, output_channel * self.cnv_num)]
        #Added Link to output layer
        link_list += [L.Linear(output_channel * self.cnv_num, n_label)]

        #Initialize the class using the list of Links defined so far
        super(CNNSC, self).__init__(*link_list)
        
        #By the way
        # self.add_link(link)
        #It is OK to enumerate the links and add them one by one like

    def __call__(self, x, train=True):
        #Prepare the filtered intermediate layer
        h_conv = [None for _ in self.filter_height]
        h_pool = [None for _ in self.filter_height]
        
        #Turn the loop for each filter type
        for i, filter_size in enumerate(self.filter_height):
            #Through the Convolution layer
            h_conv[i] = F.relu(self[i](x))
            #Through the Pooling layer
            h_pool[i] = F.max_pooling_2d(h_conv[i], (self.max_sentence_len+1-filter_size))
        # Convolution+Combine the results of Pooling
        concat = F.concat(h_pool, axis=2)
        #Dropout on the combined result
        h_l1 = F.dropout(F.tanh(self[self.cnv_num+0](concat)),ratio=0.5,train=train)
        #Compress the Dropout result to the output layer
        y = self[self.cnv_num+1](h_l1)

        return y

Experimental result

In the experiment, the data set was divided into training data and test data, and 50 epochs were rotated for training. The correct answer rate for the test data was the 50th epoch, and ʻaccuracy = 0.799437701702`.

This article When classifying documents with a model using a simpler CNN, it was ʻaccuracy = 0.775624996424`, so the accuracy rate is slightly correct. Was found to improve.

input file name: dataset/mr_input.dat
loading word2vec model...
height (max length of sentences): 59
width (size of wordembedding vecteor ): 300
epoch 1 / 50
train mean loss=0.568159639835, accuracy=0.707838237286
 test mean loss=0.449375987053, accuracy=0.788191199303
epoch 2 / 50
train mean loss=0.422049582005, accuracy=0.806962668896
 test mean loss=0.4778624475, accuracy=0.777881920338
epoch 3 / 50
train mean loss=0.329617649317, accuracy=0.859808206558
 test mean loss=0.458206892014, accuracy=0.792877197266
epoch 4 / 50
train mean loss=0.240891501307, accuracy=0.90389829874
 test mean loss=0.642955899239, accuracy=0.769447028637
 ...
epoch 47 / 50
train mean loss=0.000715514877811, accuracy=0.999791562557
 test mean loss=0.910120248795, accuracy=0.799437701702
epoch 48 / 50
train mean loss=0.000716249051038, accuracy=0.999791562557
 test mean loss=0.904825389385, accuracy=0.801312088966
epoch 49 / 50
train mean loss=0.000753249507397, accuracy=0.999791562557
 test mean loss=0.900236129761, accuracy=0.799437701702
epoch 50 / 50
train mean loss=0.000729961204343, accuracy=0.999791562557
 test mean loss=0.892229259014, accuracy=0.799437701702

in conclusion

This article also introduces the implementation of text classification using CNN.

-Text classification using convolutional (CNN) and Spatial Pyramid Pooling (SPP-net)

Reference URL

Recommended Posts

[Text classification] I implemented Convolutional Neural Networks for Sentence Classification with Chainer
[Text classification] I tried using the Attention mechanism for Convolutional Neural Networks.
[Sentence classification] I tried various pooling methods of Convolutional Neural Networks
[Chainer] Document classification by convolutional neural network
I wrote the code for Japanese sentence generation with DeZero
[Survey] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
[Deep learning] Image classification with convolutional neural network [DW day 4]
Document classification with Sentence Piece
I implemented VQE with Blueqat
Neural network starting with Chainer
Implemented Conditional GAN with chainer
Implemented SmoothGrad with Chainer v2
I tried to implement sentence classification by Self Attention with PyTorch
Learn with PyTorch Graph Convolutional Networks
Simple classification model with neural network
I implemented Attention Seq2Seq with PyTorch
I implemented a two-layer neural network
I tried sentence generation with GPT-2
Python learning memo for machine learning by Chainer Chapter 13 Basics of neural networks