Introducing how to use the Japanese morphological analysis tool ** janome **. Janome is Python's morphological analysis engine. Japanese text can be divided into morphemes to determine part of speech and to be divided (divided into words).
!pip install janome
from janome.tokenizer import Tokenizer
s = Tokenizer() # <-Create an instance of tokenizer
t = 'There is a chicken in a chicken'
tt = 'There are two chickens in the yard'
for _ in s.tokenize(t):
print(_)
for __ in s.tokenize(tt):
print(__)
[Output] =================================================== Ni ----- Particles, case particles, general, *, *, *, ni, ni, ni Crocodile ----- Noun, General, *, *, *, *, Crocodile, Crocodile, Crocodile Crocodile ----- Noun, General, *, *, *, *, Crocodile, Crocodile, Crocodile Wa ----- Particles, final particles, *, *, *, *, wa, wa, wa Chicken ----- Noun, General, *, *, *, *, Chicken, Chicken, Chicken ----- Particles, case particles, general, *, *, *, ga, ga, ga Is ----- verb, independence, *, *, one step, uninflected word, is, il, il
Garden ----- Noun, General, *, *, *, *, Garden, Niwa, Niwa Ni ----- Particles, case particles, general, *, *, *, ni, ni, ni Is ----- particle, particle, *, *, *, *, ha, ha, wa Two ----- nouns, numbers, *, *, *, *, two, two, two Feather ----- Noun, suffix, classifier, *, *, *, feather, wa, wa Chicken ----- Noun, General, *, *, *, *, Chicken, Chicken, Chicken ----- Particles, case particles, general, *, *, *, ga, ga, ga Is ----- verb, independence, *, *, one step, uninflected word, is, il, il [end] ======================================================
Reference
Recommended Posts