I wanted to implement a Markov chain chatbot in Python. Well, when I investigated what to throw in morphological analysis, Janome has no dependencies on other libraries It seems that you can install it quickly with one pip. Try it!
Python 3.8.5 Janome 0.4.1
from janome.tokenizer import Tokenizer
t = Tokenizer()
s = "The red powder of the grass-colored pencils scatters and I can't sleep"
for token in t.tokenize(s):
print(token)
Use the Tokenizer class.
t = Tokenizer()
Create a Tokenizer instance,
for token in t.tokenize(s):
print(token)
Pass the sentence you want to parse to the tokenize method. If you display the contents one by one as above, it looks like this.
python analysis.py
Grass noun,General,*,*,*,*,grass,Kusa,Kusa
Wakaba noun,Proper noun,Organization,*,*,*,Wakaba,Wakaba,Wakaba
Colored pencil noun,General,*,*,*,*,Colored pencils,Iroempitsu,Iroempitsu
Particles,Attributive,*,*,*,*,of,No,No
Red adjective,Independence,*,*,Adjective, Auoudan,Word connection,red,Akaki,Akaki
Powder noun,General,*,*,*,*,powder,Kona,Kona
Particles,Case particles,General,*,*,*,of,No,No
Scatter verb,Independence,*,*,Five steps, La line,Uninflected word,Scatter,Chill,Chill
Is a particle,Connection particle,*,*,*,*,But,Moth,Moth
Beloved adjective,Independence,*,*,Adjective, Idan,Continuous connection,Beloved,Itoshiku,Itoshiku
Sleeping verb,Independence,*,*,One step,Continuous form,sleep,Ne,Ne
Particles,Connection particle,*,*,*,*,hand,Te,Te
Kezuru verb,Independence,*,*,Five steps, La line,Uninflected word,Kezuru,Kezuru,Kezuru
Nari particle,Connection particle,*,*,*,*,Nari,Nari,Nari
You can also take out the elements one by one. I tried to output the surface layer form, the basic form, and the part of speech.
from janome.tokenizer import Tokenizer
t = Tokenizer()
s = "I can't sleep"
for token in t.tokenize(s):
print("==========")
print(token.surface + " (Surface type)")
print(token.base_form + " (Uninflected word)")
print(token.part_of_speech + " (Part of speech)")
Execution result
python analysis.py
==========
Sleep(Surface type)
sleep(Uninflected word)
verb,Independence,*,* (Part of speech)
==========
hand(Surface type)
hand(Uninflected word)
Particle,接続Particle,*,* (Part of speech)
==========
Kezuru(Surface type)
Kezuru(Uninflected word)
verb,Independence,*,* (Part of speech)
==========
Nari(Surface type)
Nari(Uninflected word)
Particle,接続Particle,*,* (Part of speech)
We will implement sentence generation in the next chapter. Markov Chain Chatbot with Python + Janome (2) Introduction to Markov Chain
Recommended Posts