Since morphological analysis may be used for preprocessing of data used for NLP, it is summarized.
An open source Japanese morphological analysis engine.
Developed by Taku Kudo, a current Google software engineer and one of the Google Japanese Input developers. The name was taken from the developer's favorite "Wakame turnip".
Install MeCab itself.
$ brew install mecab
Install MeCab dictionary.
$ brew install mecab-ipadic
Check if MeCab is installed.
$ mecab --version
mecab of 0.996
Let's try morphological analysis.
$ mecab
Let's try morphological analysis.
Trial noun,General,*,*,*,*,trial,Tamesh,Tamesh
Particles,Case particles,General,*,*,*,To,D,D
Morpheme noun,General,*,*,*,*,morpheme,Keitaiso,Keitaiso
Parsing noun,Change connection,*,*,*,*,analysis,Kaiseki,Kaiseki
Particles,Case particles,General,*,*,*,To,Wo,Wo
Verb,Independence,*,*,Sahen Suru,Continuous form,To do,Shi,Shi
Particles,Connection particle,*,*,*,*,hand,Te,Te
See verb,Non-independent,*,*,One step,Uninflected word,View,mill,mill
.. symbol,Kuten,*,*,*,*,。,。,。
EOS
The word ** morphological analysis ** has been broken down into ** morpheme ** and ** parsing **. To solve this, install the latest dictionary ** mecab-ipadic-NEologd **. First, clone the dictionary data from GitHub.
$ git clone --depth 1 [email protected]:neologd/mecab-ipadic-neologd.git
Go to the cloned repository, run install and select yes on the confirmation screen.
$ cd mecab-ipadic-neologd
$ ./bin/install-mecab-ipadic-neologd -n
yes
Specify the dictionary with the -d option and try morphological analysis again.
$ mecab -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd/
Let's try morphological analysis.
Try adverbs,General,*,*,*,*,As a test,Tameshini,Tameshini
Morphological analysis noun,Proper noun,General,*,*,*,Morphological analysis,Iseki Soca,Iseki Soca
Particles,Case particles,General,*,*,*,To,Wo,Wo
Verb,Independence,*,*,Sahen Suru,Continuous form,To do,Shi,Shi
Particles,Connection particle,*,*,*,*,hand,Te,Te
See verb,Non-independent,*,*,One step,Uninflected word,View,mill,mill
.. symbol,Kuten,*,*,*,*,。,。,。
EOS
Safely, it became one word ** morphological analysis **.
Install the library for python.
pip3 install mecab-python3
After that, write the code and try it.
import MeCab
mecab = MeCab.Tagger ('-d /usr/local/lib/mecab/dic/mecab-ipadic-neologd')
print(mecab.parse('Let's try morphological analysis.'))
Try adverbs,General,*,*,*,*,As a test,Tameshini,Tameshini
Morphological analysis noun,Proper noun,General,*,*,*,Morphological analysis,Iseki Soca,Iseki Soca
Particles,Case particles,General,*,*,*,To,Wo,Wo
Verb,Independence,*,*,Sahen Suru,Continuous form,To do,Shi,Shi
Particles,Connection particle,*,*,*,*,hand,Te,Te
See verb,Non-independent,*,*,One step,Uninflected word,View,mill,mill
.. symbol,Kuten,*,*,*,*,。,。,。
EOS