There is SNLI as a data set that classifies whether the two given sentences have a relationship between the premise and the hypothesis. In order to solve this problem, I implemented a model that combines the learned Word2Vec [^ 6] and LSTM in Keras.
Details of the SNLI [^ 1] dataset can be found in the article "How to read the SNLI dataset" [^ 2]. I'm glad if you can use it as a reference.
You can download it from The Stanford Natural Language Inference (SNLI) Corpus.
wget https://nlp.stanford.edu/projects/snli/snli_1.0.zip
unzip snli_1.0.zip
Word2Vec weights learned by Google News can be downloaded from the link "GoogleNews-vectors-negative300.bin.gz" on the following site [^ 6].
https://code.google.com/archive/p/word2vec/
It is a code [^ 5] that trains a model that combines trained Word2Vec and LSTM with the SNLI dataset. The article [^ 3] will be helpful.
https://gist.github.com/namakemono/4e2a37375edf7a5e849b0a499897dcbe
It can be confirmed that the performance is about the same as the existing 300D LSTM [^ 11]. From here, you can use Decomposable Attention [^ 7] [^ 8] and ESIM [^ 9] [^ 10] to further improve the performance.
References
Recommended Posts