--Standord Natural Language Inference [^ 1] --Annotated corpus for learning natural language reasoning --A pair consisting of two documents, premise and hypothesis, and the corresponding label (manual) --neutral: I can't say either --contradiction: Contradiction --entailment: correct ---: No label
Text | Judgments | Hypothesis |
---|---|---|
A man inspects the uniform of a figure in some East Asian country. | contradiction | The man is sleeping |
An older and younger man smiling. | neutral | Two men are smiling and laughing at the cats playing on the floor. |
A black race car starts up in front of a crowd of people. | contradiction | A man is driving down a lonely road. |
A soccer game with multiple males playing. | entailment | Some men are playing a sport. |
A smiling costumed woman is holding an umbrella. | neutral | A happy woman in a fairy costume holds an umbrella. |
--Number of data: 570,000 in total --Training: 550,000 --Validation: 10,000 --Test: 10,000
--There is also parsing data in the following format.
{
"annotator_labels": ["neutral"],
"captionID": "3416050480.jpg#4",
"gold_label": "neutral",
"pairID": "3416050480.jpg#4r1n",
"sentence1": "A person on a horse jumps over a broken down airplane.",
"sentence1_binary_parse": "( ( ( A person ) ( on ( a horse ) ) ) ( ( jumps ( over ( a ( broken ( down airplane ) ) ) ) ) . ) )",
"sentence1_parse": "(ROOT (S (NP (NP (DT A) (NN person)) (PP (IN on) (NP (DT a) (NN horse)))) (VP (VBZ jumps) (PP (IN over) (NP (DT a) (JJ broken) (JJ down) (NN airplane)))) (. .)))",
"sentence2": "A person is training his horse for a competition.",
"sentence2_binary_parse": "( ( A person ) ( ( is ( ( training ( his horse ) ) ( for ( a competition ) ) ) ) . ) )",
"sentence2_parse": "(ROOT (S (NP (DT A) (NN person)) (VP (VBZ is) (VP (VBG training) (NP (PRP$ his) (NN horse)) (PP (IN for) (NP (DT a) (NN competition))))) (. .)))"
}
You can download it from The Stanford Natural Language Inference (SNLI) Corpus.
wget https://nlp.stanford.edu/projects/snli/snli_1.0.zip
unzip snli_1.0.zip
Data is stored in json format (.jsonl) and tsv format (.txt).
import pandas as pd
df = pd.read_csv("snli_1.0/snli_1.0_train.txt", sep="\t")
References
Recommended Posts