Model using convolutional neural network in natural language processing

Introduction

Recently, natural language processing using convolutional neural networks (CNN) has attracted attention. Compared to RNNs, CNNs are easier to parallelize, and by using a GPU, convolution operations can be performed at high speed, which has the advantage of overwhelmingly high processing speed.

This article is a compilation of models using convolutional neural networks in natural language processing. We hope this will help you to get a bird's eye view of the progress of research on natural language processing using CNN.

Sentence classification (reputation analysis, topic classification, question type classification)

Convolutional Neural Networks for Sentence Classification(2014/08) A paper proposing a CNN that performs sentence classification such as reputation analysis and question type classification. スクリーンショット 2017-02-03 5.39.48.png

Specifically, sentences are represented as a sequence of word vectors, and features are extracted and classified using CNN. In the paper, it is reported that the performance was improved by using the pre-learned word vector (Google News learned with word2vec). It is interesting that each of the two channels represents a word vector, one is updated during learning and the other is not updated to improve performance. When evaluated on seven document classification tasks, including reputation analysis and question type classification, four out of seven tasks gave the best results ever.

The author's implementation of Theano and Google Brain's implementation of TensorFlow by Denny Britz: https://github.com/yoonkim/CNN_sentence https://github.com/dennybritz/cnn-text-classification-tf

A brief explanation and implementation in Japanese by ichiroex [Chainer] Document classification by convolutional neural network

Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts(2014/08) A paper proposing CNN (CharSCNN) that analyzes movie reviews and reputation on Twitter. スクリーンショット 2017-02-03 5.42.02.png

Reputation analysis for short texts such as Twitter has the problem that it is difficult because contextual information is limited. To deal with this problem, we improved performance by constructing character-level vector representations in addition to the word-level vector representations commonly used in reputation analysis, and using them to obtain sentence vector representations. I did that. Experiments with datasets for movie reviews (SSTb) and Twitter (STS) have yielded better results than previous methods.

Theano implementation by hogefugabar: https://github.com/hogefugabar/CharSCNN-theano

I also wrote a commentary article: Sentiment analysis of tweets by deep learning

#TAGSPACE: Semantic Embeddings from Hashtags(2014/10) A paper proposing a CNN that learns short text expressions using hashtags used in SNS as teachers. スクリーンショット 2017-02-03 5.47.14.png

Specifically, using CNN, the score is output for the pair of hashtags corresponding to the input text, and the expression of the text is learned in the process of ranking the hashtags. Hashtag prediction and document recommendation tasks evaluated the results better than the baseline approach.

Effective Use of Word Order for Text Categorization with Convolutional Neural Networks(2014/12) A paper that proposes a CNN for text classification considering word order. スクリーンショット 2017-02-03 5.59.33.png

There are various tasks in document classification, but in tasks such as reputation analysis, high performance cannot be obtained unless the word order is taken into consideration. In order to deal with this problem, we are proposing a CNN that can classify documents in consideration of word order. Specifically, most CNN methods input word embedding as input, but in this research, we input a high-dimensional one-hot vector as it is to learn embedding in a small text area. The effectiveness of the proposed method was shown as a result of comparison with the SOTA method using three datasets related to reputation analysis (including IMDB) and topic classification.

Implementation by the author: http://riejohnson.com/cnn_download.html

Semi-supervised Convolutional Neural Networks for Text Categorization via Region Embedding(2015/04) スクリーンショット 2017-02-03 5.48.32.png

A story that proposed a semi-supervised learning framework using CNN for text classification. In the conventional model, pre-learned word embedding was used for the input of the convolution layer, but in this study, embedding is learned from a small text area without supervised learning and used as a part of the input of the convolution layer in supervised CNN. Experiments with reputation analysis (IMDB, Elec) and topic classification (RCV1) showed higher performance than previous studies.

Implementation by the author: http://riejohnson.com/cnn_download.html

Character-level Convolutional Networks for Text Classification(2015/09) スクリーンショット 2017-02-03 5.43.09.png

A story about using a character-level convolutional neural network for text classification. The data is increased by replacing words in the text with synonyms using a thesaurus. Comparisons are made for bow, bag-of-ngram, bag-of-means as a traditional method, and word-based CNN and LSTM as a deep learning method. Eight datasets were created and compared with the base method, and some datasets showed effectiveness.

Lua implementation by the author: https://github.com/zhangxiangxiao/Crepe

A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification(2015/10) スクリーンショット 2017-02-03 5.46.08.png

The CNN model gives good results in sentence classification, but it requires an expert to decide the architecture and set hyperparameters. I'm not sure what the consequences of these changes will be, so I've verified it with more CNN. Finally, he gives practical advice on how to set the model architecture and hyperparameters when classifying sentences by CNN.

Series labeling (part of speech tagging, named entity recognition, chunking)

Natural Language Processing (almost) from Scratch(2011/03) スクリーンショット 2017-02-03 5.52.26.png

A story that proposed a neural network that can learn part-of-speech tagging, chunking, named entity recognition, and semantic role assignment. Although the performance was lower than the benchmark by simply training, it was shown that a good word vector contributes to the performance improvement by training the language model in advance using unlabeled data. Furthermore, it was shown that the performance can be further improved by sharing parameters between models for solving each task and performing multitask learning.

Summary slides in Japanese: Natural Language Processing (Almost) from Scratch (6th Deep Learning Study Group Material; Sakaki)

Learning Character-level Representations for Part-of-Speech Tagging(2014/07) スクリーンショット 2017-02-03 5.51.27.png

A story about part-of-speech tagging using CNN (CharWNN). Specifically, we constructed a vector representation of words by integrating word-level and character-level embeddings, and constructed a CNN that outputs the part-speech score by inputting the constructed vector. Experiments with datasets for English and Portuguese (WSJ and Mac-Morpho) resulted in SOTA results.

Language model

Language Modeling with Gated Convolutional Networks(2016/12) スクリーンショット 2017-02-03 9.05.21.png

It is said that CNN achieved accuracy equal to or higher than LSTM in the language model task. The folded result is processed by a mechanism similar to GRU so that past information is not lost. The Google Billion Word dataset has the same accuracy as LSTM, but the calculation efficiency has been improved by about 20 times.

Implementation by TensorFlow: Language-Modeling-GatedCNN

in conclusion

The latest summary of paper information on machine learning, natural language processing, and computer vision is distributed on the following Twitter accounts. We are waiting for you to follow us as we are delivering interesting content for those who read this article. arXivTimes

reference

Recommended Posts

Model using convolutional neural network in natural language processing
[Language processing 100 knocks 2020] Chapter 8: Neural network
Try using TensorFlow-Part 2-Convolutional Neural Network (MNIST)
3. Natural language processing with Python 2-1. Co-occurrence network
[WIP] Pre-processing memo in natural language processing
I made an image discrimination (cifar10) model using a convolutional neural network.
3. Natural language processing with Python 2-2. Co-occurrence network [mecab-ipadic-NEologd]
Another style conversion method using Convolutional Neural Network
Python: Deep Learning in Natural Language Processing: Basics
Unbearable shortness of Attention in natural language processing
Python: Natural language processing
Implement Convolutional Neural Network
RNN_LSTM2 Natural language processing
Convolutional neural network experience
Performance verification of data preprocessing in natural language processing
Implementation of a convolutional neural network using only Numpy
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 09 Memo "Identifier by Neural Network"
Natural language processing 1 Morphological analysis
Natural language processing 3 Word continuity
Neural network implementation in python
Using Python mode in Processing
Natural language processing 2 Word similarity
Types of preprocessing in natural language processing and their power
Try building a neural network in Python without using a library
Study natural language processing with Kikagaku
Japanese Natural Language Processing Using Python3 (4) Sentiment Analysis by Logistic Regression
100 natural language processing knocks Chapter 4 Commentary
100 Language Processing Knock Chapter 1 in Python
Natural language processing for busy people
100 language processing knock-76 (using scikit-learn): labeling
[Natural language processing] Preprocessing with Japanese
Natural Language: Word2Vec Part3 --CBOW model
Simple classification model with neural network
Easily build a natural language processing model with BERT + LightGBM + optuna
What is a Convolutional Neural Network?
Artificial language Lojban and natural language processing (artificial language processing)
100 Language Processing Knock 2020 Chapter 8: Neural Net
Dockerfile with the necessary libraries for natural language processing in python
100 Language Processing Knock-31 (using pandas): Verb
100 language processing knock-73 (using scikit-learn): learning
Preparing to start natural language processing
Natural language processing analyzer installation summary
100 language processing knock-74 (using scikit-learn): Prediction
Natural Language: Word2Vec Part2 --Skip-gram model
100 Language Processing Knock-38 (using pandas): Histogram
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 08 Memo "Introduction to Neural Networks"
Easy padding of data that can be used in natural language processing
You become an engineer in 100 days ――Day 66 ――Programming ――About natural language processing
Learn the basics of document classification by natural language processing, topic model
100 language processing knock-97 (using scikit-learn): k-means clustering
Survivor prediction using kaggle's titanic neural network [80.8%]
100 Language Processing Knock-33 (using pandas): Sahen noun
Natural language processing of Yu-Gi-Oh! Card name-Yu-Gi-Oh!
Asynchronous processing using Linebot in Job queue
100 Knocking Natural Language Processing Chapter 1 (Preparatory Movement)
100 Language Processing Knock-71 (using Stanford NLP): Stopword
3. Natural language processing with Python 1-1. Word N-gram
100 Language Processing Knock-35 (using pandas): Noun concatenation
Natural Language: Machine Translation Part2 --Neural Machine Translation Transformer
Implementation of "blurred" neural network using Chainer
Simple neural network implementation using Chainer-Data preparation-