Target

This is a continuation of document classification using the Microsoft Cognitive Toolkit (CNTK).

In Part2, the document data prepared in Part1 will be used to classify documents by CNTK. It is assumed that CNTK and NVIDIA GPU CUDA are installed.

Introduction

Natural Language: Doc2Vec Part1 --livedoor NEWS Corpus prepared training data and verification data.

In Part2, we will create a Doc2Vec model and classify sentences.

Doc2Vec Doc2Vec [1] [2] [3] is an extension of Word2Vec. The Doc2Vec implemented this time is a simple model that averages the output of the embedded layer of all words contained in one document and classifies which category the document belongs to.

Settings in training

The default value for each parameter uses the CNTK default settings. In most cases, it has a uniform distribution of Glorot [4].

Word2Vec adopted Sampled Softmax [5] to speed up the output layer to predict words, but this document Since the classification is 9 categories, I used the normal Softmax function and Cross Entropy Error.

Adam [6] was used as the optimization algorithm. Adam's learning rate is 0.01, hyperparameters $ β_1 $ are set to 0.9, and $ β_2 $ is set to the default value of CNTK.

Model training performed 10 Epoch by mini-batch learning.

Implementation

Execution environment

hardware

-CPU Intel (R) Core (TM) i7-6700K 4.00GHz ・ GPU NVIDIA GeForce GTX 1060 6GB

software

・ Windows 10 Pro 1909 ・ CUDA 10.0 ・ CuDNN 7.6 ・ Python 3.6.6 ・ Cntk-gpu 2.7 ・ Pandas 0.25.0

Program to run

The training program is available on GitHub.

`doc2vec_training.py`

result

Training loss and error

The figure below is a visualization of the loss function and false recognition rate logs during training. The graph on the left shows the loss function, the graph on the right shows the false recognition rate, the horizontal axis shows the number of epochs, and the vertical axis shows the value of the loss function and the false recognition rate, respectively.

Validation accuracy and confusion matrix

When the performance was evaluated using the verification data that was separated when preparing the data in Part 1, the following results were obtained.

Accuracy 90.00%

The figure below is a visualization of the mixed matrix of the verification data. The column direction is the prediction and the row direction is the correct answer.

Featured word due to backpropagation of gradient

I tried to find out which words in a sentence are important when classifying documents using back propagation of gradients.

`dojujo-tsushin`


1 single woman
2 Lady Girls
3 Saori Abe
4 woman
5 never get married
6 age
7 me
8 married
9 Values
10 copies

Words about women are emphasized in articles in the German newsletter.

`it-life-hack`


1 smartphone
2 services
3 services
4 apps
5 google
6 google
7 google
8 google
9 google
10 google

Words about IT are emphasized in IT Lifehack articles.

`sports-watch`


1 training
2 number of places
3 clubs
4 clubs
5 home
6 Top
7 Vision
8 Yoshida
9 Yoshida
10 Yoshida

Sports Watch articles emphasize sports-related words.

reference

Natural Language : Doc2Vec Part1 - livedoor NEWS Corpus Natural Language : Word2Vec Part2 - Skip-gram model

Quoc Le and Tomas Mikolov. "Distributed Representations of Sentences and Documents", International Conference on Machine Learning. 2014, pp 1188-1196.
Andrew M. Dai, Christopher Olah, and Quoc V. Le. "Document Embedding with Paragraph Vectors", arXiv preprint arXiv:1507.07998 (2015).
Jey Han Lau and Timothy Baldwin. "An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation", arXiv preprint arXiv:1607.05368 (2016).
Xaiver Glorot and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks", Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. 2010, pp 249-256.
Sebastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. "On Using Very Large Target Vocabulary for Neural Machine Translation", arXiv preprint arXiv:1412.2007 (2014).
Diederik P. Kingma and Jimmy Lei Ba. "Adam: A method for stochastic optimization", arXiv preprint arXiv:1412.6980 (2014).

Natural Language: Doc2Vec Part2 --Document Classification