A person who attends a university in Tokyo. It's been a year since I entered university, and seven months have passed since I started programming. I usually make apps with my friends and participate in machine learning competitions. I learned about an event called [Qiita x COTOHA API present project] yesterday. I want a Mac, so I will write my first article with the aim of winning a prize. : fist: (Today is the deadline ...: angel_tone2 :)
Universities are charged with many reports. This is a very painful event. : frowning2:
One way to make things easier is to squeeze people's reports. However, if you squeeze every word, you will be disappointed. Therefore, I wondered if I could change the expression while keeping the contents of the report.
This time, I'm going to write the following text (a part of the report I wrote last month).
It's not a good sentence that I want to squeeze ... There are many "again" ...: sweat:
Use a thesaurus or GCP to play with the original expression little by little. The result is this.
The expression has changed in some places. The third rule is being followed, but the "National Institute of Population and Social Security Research" has changed to the "National Institute of Population and Social Security Research." It is not possible to correspond to the person name or organization name.
With full satisfaction, the COTOHA API is here. COTOHA API link COTOHA API is a service that provides various natural language processing and speech processing APIs such as parsing, resolution analysis, keyword extraction, speech recognition, and summarization. The Japanese dictionary, which is the result of 40 years of research by the NTT Group. You can easily use advanced analysis with API by utilizing the technology to classify the meaning of more than 3000 kinds of words and words. " Among this excellent API, this time we will use the named entity extraction API to determine the person name and organization name.
From here You can easily register. After registering, check here API BASE URL, CLIENT ID, and CLIENT secret.
get_token.py
data = {
"grantType": "client_credentials",
"clientId": "Your CLIENT ID",
"clientSecret": "Your CLIENT secret"
}
str_json = json.dumps(data)
url = "https://api.ce-cotoha.com/v1/oauth/accesstokens"
headers={
"Content-Type": "application/json"
}
result=requests.post(url,headers=headers,data=str_json)
print(result.text)
Now you can see the access token.
get_koyu.py
def get_koyu_(text, token):
data = {
"sentence":text,
"type": "default"
}
headers = {
"Content-Type":"application/json",
"Authorization":"Bearer " + token
}
str_json = json.dumps(data)
url = "https://api.ce-cotoha.com/api/dev/nlp/v1/ne"
rr=requests.post(url,headers=headers,data=str_json)
result = json.loads(rr.text)["result"]
You can extract the named entity in the sentence with the above code.
[{'begin_pos': 3,'end_pos': 8,'form':'Ministry of Health and Labor',' std_form':'Ministry of Health and Labor',' class':'ORG',' extended_class':'',' source ':' basic'}, {'begin_pos': 42,'end_pos': 47,'form':' 2017',' std_form':' 2017',' class':'DAT',' extended_class ':'','source':' basic'}, {'begin_pos': 84,'end_pos': 88,'form': '50 years later','std_form': '50 years later',' class' :'DAT','extended_class':'','source':' basic'}, {'begin_pos': 156,'end_pos': 170,'form':'National Institute of Population and Social Security Research',' std_form':'National Institute of Population and Social Security Research',' class':'ORG',' extended_class':'','source':' basic'}, ...
Since categories such as place name, person name, and organization name are given, the place name, person name, and organization name are determined based on these categories.
Then rewrite the code so that these named entities are not changed.
Before pucking
After pucking
The words in "" and the organization name remain the same, but only the expressions have changed. There are some subtleties, but I'll do it for the time being.
The actually completed app is here.
Recommended Posts