Background [Qiita x COTOHA API present plan] I was worried about it, so I decided to use it for a while.
MeCab or KNP are famous libraries for natural language processing, but I was ashamed to know the COTOHA API for the first time at this event.
Motive It may be a little hurdle personally, but I tried using the anatomical resolution API first. This analysis is the process of replacing pronouns ("I", "you", "he", etc.) and demonstratives ("that", "this") in sentences with such people and objects. The reason for using it is that I wanted to use it as an initial stage to clarify who did what and then process it if I had time to spare.
Is the event in mid-March? .. ..
Will it be in time?
Environment
Development
import requests
import json
import time
import sys
#---Get these 4 parameters with Portal---
PUBLISH_URL = "--- get your parameter ---"
CLIENT_ID = "--- get your parameter ---"
CLIENT_SECRET = "--- get your parameter ---"
BASE_URL = "--- get your parameter ---"
def getToken():
header = {"Content-Type": "application/json"}
contents = {
"grantType": "client_credentials",
"clientId": CLIENT_ID,
"clientSecret": CLIENT_SECRET
}
raw_res = requests.post(PUBLISH_URL, headers=header, json=contents)
response = raw_res.json()
return response["access_token"]
def coreference(token, sentence):
header = {
"Authorization": "Bearer {}".format(token),
"Content-Type": "application/json"
}
contents = {
"document": sentence
}
raw_res = requests.post(
BASE_URL +
"nlp/v1/coreference",
headers=header,
json=contents)
response = raw_res.json()
return response
if __name__ == "__main__":
if len(sys.argv) != 2:
sys.exit()
message = sys.argv[1]
token = getToken()
time.sleep(0.5)
print(coreference(token, message))
As a flow Get Token-> Call each API you want to use There are two.
curl does the same in python.
$ curl -X POST -H "Content-Type:application/json" -d '{
"grantType": "client_credentials",
"clientId": "[clientid]",
"clientSecret": "[clientsecret]"
}' [Access Token Publish URL
]
$ curl -H "Content-Type:application/json;charset=UTF-8" -H "Authorization:Bearer [access_token]" -X POST -d '{
"document": --Enter the text you want to analyze here--
}' "[Developer API Base URL]/nlp/v1/coreference"
Consequence
Let's analyze using two sentences.
--Yamada bought cup noodles. He was eating deliciously. --Yamada gave Saito cup noodles. He was eating deliciously.
$ python main.py Yamada bought cup noodles. He was eating deliciously.
{'result': {'coreference': [{'representative_id': 0, 'referents': [{'referent_id': 0, 'sentence_id': 0, 'token_id_from': 0, 'token_id_to': 0, 'form': 'Yamada'}, {'referent_id': 1, 'sentence_id': 0, 'token_id_from': 10, 'token_id_to': 10, 'form': 'he'}]}], 'tokens': [['Yamada', 'Kun', 'Is', 'cup', 'noodles', 'To', 'Buy', 'Tsu', 'Ta', '。', 'he', 'Is', 'Delicious', 'so', 'To', 'eat', 'hand', 'I', 'Ta', '。']]}, 'status': 0, 'message': 'OK'}
$ python main.py Yamada gave Saito cup noodles. He was eating deliciously.
{'result': {'coreference': [{'representative_id': 0, 'referents': [{'referent_id': 0, 'sentence_id': 0, 'token_id_from': 0, 'token_id_to': 0, 'form': 'Yamada'}, {'referent_id': 1, 'sentence_id': 0, 'token_id_from': 13, 'token_id_to': 13, 'form': 'he'}]}], 'tokens': [['Yamada', 'You', 'Is', 'Saito', 'You', 'To', 'cup', 'noodles', 'To', 'Watari', 'Shi', 'Ta', '。', 'he', 'Is', '美味Shi', 'so', 'To', 'eat', 'hand', 'I', 'Ta', '。']]}, 'status': 0, 'message': 'OK'}
Consideration
Yamada bought cup noodles. He was eating deliciously. `` Then
'coreference': [{'representative_id': 0,'referents': [{'referent_id': 0,'sentence_id': 0,'token_id_from': 0,' token_id_to': 0,'form':' Yamada' }, {'referent_id': 1,'sentence_id': 0,'token_id_from': 10,'token_id_to': 10,'form':'he'}] You can see that
Yamada-he` is connected.
But,
Yamada gave Saito cup noodles. He was eating deliciously. `` Then
'coreference': [{'representative_id': 0,'referents': [{'referent_id': 0,'sentence_id': 0,'token_id_from': 0,'token_id_to': 0,'form':' Yamada' }, {'referent_id': 1,'sentence_id': 0,'token_id_from': 13,'token_id_to': 13,'form':'he'}] And
Yamada-he are connected and not
Saito-he`.
In this case, even if you hand over the cup noodles, Yamada will have eaten it.
It's Gouda thinking: scream :.
Comparation -> KNP
I will try if it was KNP. http://lotus.kuee.kyoto-u.ac.jp/~ryohei/zero_anaphora/index.cgi
This is
Saito-he
It seems that it is recognized correctly because it is.
but,
Yamada gave Saito cup noodles. I was eating it deliciously. `` When I delete the pronoun and analyze it, it becomes
Yamada-he` and it is not recognized correctly.
Conclusion
This time, the anaphora analysis is called direct anaphora, and the pronoun is explicitly written, but it is often not written in the text. http://adsmedia.hatenablog.com/entry/2017/02/20/084846 There are indirect anaphora, external anaphora, zero anaphora, etc.
The genome was all analyzed in the 2000s, but I don't know how to make it. There are types, but there are countless combinations.
Natural language can also discriminate from one sentence to the type of word, but it can still be examined like anaphora resolution. There are many. As one of the methods, it may be difficult for users to collect sources and apply machine learning to write words that are not in the sentence. It's like Saito eating cup noodles with an 80% chance of being among various characters. : robot:
PostScript
Next, use Parsing to analyze or create a product.
Recommended Posts