I noticed while testing this, but in the API demo, about the analysis result of "My daughter-in-law and daughter went on a trip."
I was wondering if this clearly represents all the correct answers, but in reality
{
"id" : 2,
"form" : "Daughter",
"kana" : "Musume",
"lemma" : "Daughter",
"pos" : "noun",
"dependency_labels" : [ {
"token_id" : 0,
"label" : "conj"
}, {
"token_id" : 3,
"label" : "case"
} ],
"attributes" : { }
}
It is returned as, and it looks like the red line below in the figure.
Since the JSON attribute name is "dependency_labels", it is easy to understand that there are multiple, but if you look only at the demo, it seems that there are not multiple, so I thought that you need to be careful. It also seemed that the demo didn't fully convey the appeal of the API.
Only one "document" (example: multiple logs in the call center) that can be specified as the processing target is one API call. When trying to process a large number of documents, I want to process multiple documents at once instead of one by one, so I thought that this point should also be noted. (Example: It is inappropriate to concatenate and parse the call logs of different customers.) Expect the next version to be able to process multiple documents.
When I send "My daughter-in-law and my daughter went on a trip. I and my son ate grilled meat.", The following response is returned.
{
"result" : [ {
"chunk_info" : {
"id" : 0,
"head" : 1,
"dep" : "P",
"chunk_head" : 0,
"chunk_func" : 1,
"links" : [ ]
},
"tokens" : [ {
"id" : 0,
"form" : "Daughter-in-law",
"kana" : "Yome",
"lemma" : "Daughter-in-law",
"pos" : "noun",
"features" : [ ],
"common_noun_semantic" : [ 49, 76, 88 ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ ],
"dependency_labels" : [ {
"token_id" : 1,
"label" : "cc"
} ],
"attributes" : { }
}, {
"id" : 1,
"form" : "When",
"kana" : "To",
"lemma" : "When",
"pos" : "Case particles",
"features" : [ "Continuous use" ],
"common_noun_semantic" : [ ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ ],
"attributes" : { }
} ]
}, {
"chunk_info" : {
"id" : 1,
"head" : 7,
"dep" : "D",
"chunk_head" : 0,
"chunk_func" : 1,
"links" : [ {
"link" : 0,
"label" : "other"
} ]
},
"tokens" : [ {
"id" : 2,
"form" : "Daughter",
"kana" : "Musume",
"lemma" : "Daughter",
"pos" : "noun",
"features" : [ ],
"common_noun_semantic" : [ 49, 59, 88 ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ ],
"dependency_labels" : [ {
"token_id" : 0,
"label" : "conj"
}, {
"token_id" : 3,
"label" : "case"
} ],
"attributes" : { }
}, {
"id" : 3,
"form" : "Is",
"kana" : "C",
"lemma" : "Is",
"pos" : "Conjunctive particles",
"features" : [ ],
"common_noun_semantic" : [ ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ ],
"attributes" : { }
} ]
}, {
"chunk_info" : {
"id" : 2,
"head" : 3,
"dep" : "D",
"chunk_head" : 0,
"chunk_func" : 1,
"links" : [ ]
},
"tokens" : [ {
"id" : 4,
"form" : "Travel",
"kana" : "Ryoko",
"lemma" : "Travel",
"pos" : "noun",
"features" : [ "motion" ],
"common_noun_semantic" : [ 1658, 1659, 1660 ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ 18 ],
"dependency_labels" : [ {
"token_id" : 5,
"label" : "case"
} ],
"attributes" : { }
}, {
"id" : 5,
"form" : "To",
"kana" : "D",
"lemma" : "To",
"pos" : "Case particles",
"features" : [ "Continuous use" ],
"common_noun_semantic" : [ ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ ],
"attributes" : { }
} ]
}, {
"chunk_info" : {
"id" : 3,
"head" : 7,
"dep" : "P",
"chunk_head" : 0,
"chunk_func" : 2,
"links" : [ {
"link" : 2,
"label" : "purpose"
} ],
"predicate" : [ "past" ]
},
"tokens" : [ {
"id" : 6,
"form" : "line",
"kana" : "I",
"lemma" : "go",
"pos" : "Verb stem",
"features" : [ "IKU" ],
"common_noun_semantic" : [ 2053, 2132 ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ 15, 20, 29, 32, 5 ],
"dependency_labels" : [ {
"token_id" : 4,
"label" : "nmod"
}, {
"token_id" : 7,
"label" : "aux"
}, {
"token_id" : 8,
"label" : "aux"
}, {
"token_id" : 9,
"label" : "punct"
} ],
"attributes" : { }
}, {
"id" : 7,
"form" : "Tsu",
"kana" : "Tsu",
"lemma" : "Tsu",
"pos" : "Verb conjugation ending",
"features" : [ ],
"common_noun_semantic" : [ ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ ],
"attributes" : { }
}, {
"id" : 8,
"form" : "Ta",
"kana" : "Ta",
"lemma" : "Ta",
"pos" : "Verb suffix",
"features" : [ "stop" ],
"common_noun_semantic" : [ ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ ],
"attributes" : { }
}, {
"id" : 9,
"form" : "。",
"kana" : "",
"lemma" : "。",
"pos" : "Kuten",
"features" : [ ],
"common_noun_semantic" : [ ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ ],
"attributes" : { }
} ]
}, {
"chunk_info" : {
"id" : 4,
"head" : 7,
"dep" : "D",
"chunk_head" : 0,
"chunk_func" : 1,
"links" : [ ]
},
"tokens" : [ {
"id" : 10,
"form" : "I",
"kana" : "I",
"lemma" : "I",
"pos" : "noun",
"features" : [ "Pronoun" ],
"common_noun_semantic" : [ 37, 8 ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ ],
"dependency_labels" : [ {
"token_id" : 11,
"label" : "cc"
} ],
"attributes" : { }
}, {
"id" : 11,
"form" : "When",
"kana" : "To",
"lemma" : "When",
"pos" : "Case particles",
"features" : [ "Continuous use" ],
"common_noun_semantic" : [ ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ ],
"attributes" : { }
} ]
}, {
"chunk_info" : {
"id" : 5,
"head" : 7,
"dep" : "D",
"chunk_head" : 0,
"chunk_func" : 1,
"links" : [ ]
},
"tokens" : [ {
"id" : 12,
"form" : "son",
"kana" : "Musco",
"lemma" : "son",
"pos" : "noun",
"features" : [ ],
"common_noun_semantic" : [ 48, 58, 87 ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ ],
"dependency_labels" : [ {
"token_id" : 13,
"label" : "case"
} ],
"attributes" : { }
}, {
"id" : 13,
"form" : "Is",
"kana" : "C",
"lemma" : "Is",
"pos" : "Conjunctive particles",
"features" : [ ],
"common_noun_semantic" : [ ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ ],
"attributes" : { }
} ]
}, {
"chunk_info" : {
"id" : 6,
"head" : 7,
"dep" : "D",
"chunk_head" : 0,
"chunk_func" : 1,
"links" : [ ]
},
"tokens" : [ {
"id" : 14,
"form" : "Roasted meat",
"kana" : "Yakiniku",
"lemma" : "Roasted meat",
"pos" : "noun",
"features" : [ ],
"common_noun_semantic" : [ 843, 852 ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ ],
"dependency_labels" : [ {
"token_id" : 15,
"label" : "case"
} ],
"attributes" : { }
}, {
"id" : 15,
"form" : "To",
"kana" : "Wo",
"lemma" : "To",
"pos" : "Case particles",
"features" : [ "Continuous use" ],
"common_noun_semantic" : [ ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ ],
"attributes" : { }
} ]
}, {
"chunk_info" : {
"id" : 7,
"head" : -1,
"dep" : "O",
"chunk_head" : 0,
"chunk_func" : 1,
"links" : [ {
"link" : 1,
"label" : "agent"
}, {
"link" : 3,
"label" : "manner"
}, {
"link" : 4,
"label" : "coagent"
}, {
"link" : 5,
"label" : "agent"
}, {
"link" : 6,
"label" : "object"
} ],
"predicate" : [ "past" ]
},
"tokens" : [ {
"id" : 16,
"form" : "eat",
"kana" : "Tabe",
"lemma" : "eat",
"pos" : "Verb stem",
"features" : [ "A" ],
"common_noun_semantic" : [ 1581, 1590 ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ 2, 23 ],
"dependency_labels" : [ {
"token_id" : 2,
"label" : "nsubj"
}, {
"token_id" : 6,
"label" : "advcl"
}, {
"token_id" : 10,
"label" : "nmod"
}, {
"token_id" : 12,
"label" : "nsubj"
}, {
"token_id" : 14,
"label" : "dobj"
}, {
"token_id" : 17,
"label" : "aux"
}, {
"token_id" : 18,
"label" : "punct"
} ],
"attributes" : { }
}, {
"id" : 17,
"form" : "Ta",
"kana" : "Ta",
"lemma" : "Ta",
"pos" : "Verb suffix",
"features" : [ "stop" ],
"common_noun_semantic" : [ ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ ],
"attributes" : { }
}, {
"id" : 18,
"form" : "。",
"kana" : "",
"lemma" : "。",
"pos" : "Kuten",
"features" : [ ],
"common_noun_semantic" : [ ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ ],
"attributes" : { }
} ]
} ],
"status" : 0,
"message" : ""
}
If you look at "Go" here, there are four destinations, "4,7,8,9". Actually, it should be 6 including "0,2", but the API returns 4 as well.
Originally expected result
You'll get the results you expect if you send a single statement instead of sending multiple statements at the same time.
{
"id" : 6,
"form" : "line",
"kana" : "I",
"lemma" : "go",
"pos" : "Verb stem",
"features" : [ "IKU" ],
"common_noun_semantic" : [ 2053, 2132 ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ 15, 20, 29, 32, 5 ],
"dependency_labels" : [ {
"token_id" : 0,
"label" : "nmod"
}, {
"token_id" : 2,
"label" : "nsubj"
}, {
"token_id" : 4,
"label" : "nmod"
}, {
"token_id" : 7,
"label" : "aux"
}, {
"token_id" : 8,
"label" : "aux"
}, {
"token_id" : 9,
"label" : "punct"
} ],
"attributes" : { }
}
Is the parsing a bit suspicious when I send multiple sentences, or is there a problem with my calling? .. Investigation required.
{
"id" : 6,
"form" : "line",
"kana" : "I",
"lemma" : "go",
"pos" : "Verb stem",
"features" : [ "IKU" ],
"common_noun_semantic" : [ 2053, 2132 ],
"proper_noun_semantic" : [ ],
"declinable_word_semantic" : [ 15, 20, 29, 32, 5 ],
"dependency_labels" : [ {
"token_id" : 4,
"label" : "nmod"
}, {
"token_id" : 7,
"label" : "aux"
}, {
"token_id" : 8,
"label" : "aux"
}, {
"token_id" : 9,
"label" : "punct"
} ],
"attributes" : { }
}
If you look at spec, it says "sentence: sentence to be parsed", and you can read it as a single sentence. However, "separating by sentences" is also one of the natural language processing, so I thought that if only a single sentence was targeted, it would be an issue as an API specification. In the field of text mining, unlike the academic world, I think that it is rare to process only a single sentence.
For example, Stanford NLP also returns sentence breaks as annotations. (For example, "I went to see Morning Musume's live performance" must be analyzed correctly.)
If I'm a professional user, contact support. If I was in charge of delivery, I would urge you to fix it. If I were a development leader, it would be fixed immediately ^ _ ^ ;;