Natural language processing for the first time. It's exciting. This article is [Qiita x COTOHA API present plan] Let's analyze the text with COTOHA API! ](Https://zine.qiita.com/event/collaboration-cotoha-api/?utm_source=qiita&utm_medium=banner). ~~ I want too many prizes! ~~ I was in time for posting.
First of all, I will briefly introduce what to do. The following one was made ↓
python3 bubuduke.py "Hetaxo"
"Don't be good"
I will make a Kyoto dialect translator like this. Bubu-zuke is delicious! Yay!
--Try using the COTOHA API for the time being (I used it for the first time) --What is natural language processing? --Try natural language processing of text
With that flow, if you notice, you have a convenient tool like the one above. Very easy.
Register now for free from here. Send your email address to create an account. Log in if you can. Then you will see this screen. (Promotion so far)
This is the end of this site because I will only use the ID later.
To put it very simply, to process the words (= natural language) that humans usually use. No, isn't it as it is? What is difficult about this is that ** natural language, especially Japanese, is not well-defined **.
"Not well-defined" means that the definition does not assign a unique interpretation or value. You can think of multiple interpretations for a sentence here.
Consider a simple example. As follows. There is no one who sees this and thinks that it is not good to shed poop. However, if you try to write this sentence literally with pseudo code, it will be like this.
if to flow== "Toilet Paper" then
You can shed
that? I feel like I can't poop.
The Kyoto dialect that I decided to handle this time. For example, like this.
The one above is the famous Bubu-zuke. Bubu-zuke should mean Ochazuke in the Kyoto dialect. Even so, being advised to pickle in Bubu means that you should go home. I don't understand.
Let me give you another example.
I'm sorry, the derailment is long. There are many other Kyoto dialects that I dislike, but I will omit them. If you are interested, I think you can find various here. In short, ** Kyoto dialect is insidious and is the height of words that are not well-defined. ** **
In natural language processing, words and syntax are processed against information from these natural languages. After all, I'm only explaining natural language. From now on, we will implement a bot that can process the natural language of Japanese and reply to such an insidious Kyoto dialect "wind".
Ignore the part that makes the bot for the time being, and do natural language processing. ~~ To be honest, this is the essence, so you don't have to read anything else. ~~ The essence is from here, but COTOHA was so amazing that it ended soon.
For the time being, accept the input and try to process it lightly. This is a demo that receives sentences and returns only nouns. I referred to Masterpiece. The power of Library is so incomplete that it can be done without knowing anything. First, put the library.
pip install git+https://github.com/obilixilido/cotoha-nlp.git
samplecode1.py
from cotoha_nlp.parse import Parser
parser = Parser("Client ID",
"Client secret",
"https://api.ce-cotoha.com/api/dev/nlp",
"https://api.ce-cotoha.com/v1/oauth/accesstokens"
)
s = parser.parse(input())
print(" ".join([token.form for token in s.tokens if token.pos in ["noun"]]))
I will post it again later, but I have posted the sample code on GitHub, so please have a look there as well.
Let's run this code. If you enter in the order of python file name string
, the result of processing the string will be returned.
python samplecode1.py Spring is Akebono. Yes, let's go to Kyoto.
Then it will come back.
>>Spring Akebono Kyoto
What do you think. With this much code, I was able to do good natural language processing. It's too dangerous. ~~ I don't understand anything. ~~
Next, I would like to start implementing the Kyoto dialect bot. Scraping from the previous site, if there is a Japanese literal translation that matches the extracted noun, return the Kyoto dialect. Here's the flow for making from ordinary Japanese input.
I will scrape it, so I will add a little library.
pip3 install requests
pip3 install beautifulsoup4
bubuduke.py
from cotoha_nlp.parse import Parser
import requests
from bs4 import BeautifulSoup
import re
parser = Parser("Client ID",
"Client secret",
"https://api.ce-cotoha.com/api/dev/nlp",
"https://api.ce-cotoha.com/v1/oauth/accesstokens"
)
# input
s = parser.parse(input())
# get nouns
nouns = [token.form for token in s.tokens if token.pos in ["noun"]]
# web scraping
r = requests.get('https://iirou.com/kazoekata/')
soup = BeautifulSoup(r.content, "html.parser")
block = soup.find_all("p")
# output
for noun in nouns:
for tag in block:
if noun in str(tag):
#Cut out the Kyoto dialect in the strong tag
output = re.findall('<strong>.*</strong>', str(tag))
out = output[0]
out = out.replace("<strong>", "")
out = out.replace("</strong>", "")
print(out)
Execute immediately.
python bubuduke.py "Annoying"
The insidious Kyoto dialect is back!
>>"Young lady, you're good at playing the piano."
This is the end of the natural language processing part. Next time, I'll just make a bot. The genre will change, and this article ends here.
I want to make it a LINE bot. I want to improve the accuracy a little more. I want to pick up words that do not exactly match. There are very few words that correspond so far. I will write about that again next time.
Repository of code used this time Reference After all, the official reference is the best for any language, library, framework, etc.
Thank you for reading until the end. ~~ I want too many prizes ~~ LGTM please. If LGTM does not accept this area Please like it for reference.
Recommended Posts