■ [Google Colaboratory] Use morphological analysis (janome)

Introducing how to use the Japanese morphological analysis tool ** janome **. Janome is Python's morphological analysis engine. Japanese text can be divided into morphemes to determine part of speech and to be divided (divided into words).

Install of "janome"


!pip install janome

How to use


from janome.tokenizer import Tokenizer
s = Tokenizer() # <-Create an instance of tokenizer
t = 'There is a chicken in a chicken'
tt = 'There are two chickens in the yard'
for _ in s.tokenize(t):
  print(_)
for __ in s.tokenize(tt):
  print(__)

[Output] =================================================== Ni ----- Particles, case particles, general, *, *, *, ni, ni, ni Crocodile ----- Noun, General, *, *, *, *, Crocodile, Crocodile, Crocodile Crocodile ----- Noun, General, *, *, *, *, Crocodile, Crocodile, Crocodile Wa ----- Particles, final particles, *, *, *, *, wa, wa, wa Chicken ----- Noun, General, *, *, *, *, Chicken, Chicken, Chicken ----- Particles, case particles, general, *, *, *, ga, ga, ga Is ----- verb, independence, *, *, one step, uninflected word, is, il, il

Garden ----- Noun, General, *, *, *, *, Garden, Niwa, Niwa Ni ----- Particles, case particles, general, *, *, *, ni, ni, ni Is ----- particle, particle, *, *, *, *, ha, ha, wa Two ----- nouns, numbers, *, *, *, *, two, two, two Feather ----- Noun, suffix, classifier, *, *, *, feather, wa, wa Chicken ----- Noun, General, *, *, *, *, Chicken, Chicken, Chicken ----- Particles, case particles, general, *, *, *, ga, ga, ga Is ----- verb, independence, *, *, one step, uninflected word, is, il, il [end] ======================================================

Reference

Comparison of morphological analysis tools (NLP2018)