This time, we performed Sentiment Analysis (or Sentiment Classification) of labeled tweet data. ** 2015/10/19 An additional experiment was conducted. ** ** ** 2015/12/19 The source code of SCNN has been released. hogefugabar / CharSCNN-theano ** ** 2015/12/27 Not only SCNN but also CharSCNN implementation has been released. hogefugabar / CharSCNN-theano **
This time, I tried to use the algorithm called CharSCNN described in the paper Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts. For convenience, I used an algorithm called SCNN. This algorithm gives a sentence (Sentence) as input as a series of one-hot expressions of words. CharSCNN gives one-hot expressions to letters in addition to words. If my understanding is correct, SCNN seems to have an architecture close to the following.
From [UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment Classification](http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval079.pdf)
There was satwantrana / CharSCNN on GitHub, so I tried to use it as it is, but I fixed it myself because various codes were strange. ** 2015/12/19 The source code has been released. hogefugabar / CharSCNN-theano Please refer to here. ** **
I implemented it like To Word Embeddings → Convolution → Max Pooling → Fully-Connected → Fully-Connected → Softmax. I also use Dropout, RMSprop, etc.
I used 20,000 tweets of tweets_clean.txt in satwantrana / CharSCNN. Training data 18000 tweets and test data 2000 tweets. Each tweet is labeled 0/1 (negative / positive), so it is classified into 2 classes.
Graph 10 Seed average is taken, but moving average is taken.
It felt like overfitting started around 2 epoch (180000 * 2 iteration) turns.
The maximum classification accuracy is about 0.8. The original paper said that it went from 0.82 to 0.85, so I think that the difference in the data set and the difference in the parameters have an effect.
Converting the very first input to Word Embeddings It seems that the result will be better if you use the weight pre-learned with Word2Vec, so I would like to try that as well.
Since Word2Vec was included in the Chainer sample, I used the result of pre-learning with Skip-gram with Negative-Sampling. Pre-learning with Chainer and turning the Theano program lol. Thank you cPickle.

Well, it was better to pre-learn the rise, but it is better not to pre-learn the final result. .. .. What if I turn it a little longer?
Recommended Posts