I have referred to various articles about installing MeCab with python3. I have a short temper, so I love pages that work well by copying from the top of the page. "Mendokusai" is a habit, and "Make if not" is the motto. (It doesn't matter in the second half.)
CentOS7
It was quick to clone MeCab.
# git clone https://github.com/taku910/mecab.git
# cd mecab/mecab
# ./configure --enable-utf8-only
# make
# make check
# make install
You can download MeCab from the page below, but it's quite annoying. There were various MeCabs.
Reference: MeCab https://drive.google.com/drive/folders/0B4y35FiV1wh7fjQ5SkJETEJEYzlqcUY4WUlpZmR4dDlJMWI5ZUlXN2xZN2s2b0pqT3hMbTQ
If you don't have a dictionary, you won't be able to use it, so install it quickly.
# cd mecab-ipadic
# ./configure --with-charset=utf8
# make
# make install
After installation, you can run it on the console, so let's try it.
# mecab
MeCab is free software
MeCab noun,Proper noun,Organization,*,*,*,*
Is a particle,Particle,*,*,*,*,Is,C,Wow
Free noun,General,*,*,*,*,free,free,free
Software noun,General,*,*,*,*,software,software,software
Auxiliary verb,*,*,*,Special Death,Uninflected word,is,death,death
EOS
It worked. It is a moment of relief that Japanese is displayed without any problems.
This is where pip comes in.
# pip install mecab-python3
On other sites, there are pages that suddenly post this command. But this command shouldn't work without installing MeCab as well. It goes without saying that I believed in the pip universal theory, and when I saw this code, I was caught by pip Hoi Hoi saying, "This is easier!"
You can install it without any problems.
Now let's write the python file test.py.
#test.py
# coding: UTF-8
import sys
import MeCab
m = MeCab.Tagger ("-Ochasen")
print(m.parse ("Make it yourself because it's annoying"))
I will try it.
# python test.py
Mendokusai Mendokusai Mendokusai adjective-Independent adjectives and uninflected words
From Kara to particles-Connection particle
Self Jibun Self Noun-General
De de de particle-Case particles-General
Make Tsukuru Make Verb-Independent five-stage, la line basic form
EOS
You can change the data output format by changing the argument of MeCab.Tagger.
-Ochasen -Owakati -Oyomi mecabrc
There are other things like that.
#test2.py
# coding: UTF-8
import sys
import MeCab
m = MeCab.Tagger ("-Ochasen")
print(m.parse ("Make it yourself because it's annoying"))
m = MeCab.Tagger ("-Owakati")
print(m.parse ("Make it yourself because it's annoying"))
m = MeCab.Tagger ("-Oyomi")
print(m.parse ("Make it yourself because it's annoying"))
m = MeCab.Tagger ("mecabrc")
print(m.parse ("Make it yourself because it's annoying"))
I'm interested, so I'll try to display it.
# python test2.py
Mendokusai Mendokusai Mendokusai adjective-Independent adjectives and uninflected words
From Kara to particles-Connection particle
Self Jibun Self Noun-General
De de de particle-Case particles-General
Make Tsukuru Make Verb-Independent five-stage, la line basic form
EOS
Make it yourself from annoyance
Mendoku Saikara Jibun Detsukuru
Annoying adjectives,Independence,*,*,Adjective, Auoudan,Uninflected word,Troublesome,Annoying,Annoying
From particles,Connection particle,*,*,*,*,From,Kara,Kara
My noun,General,*,*,*,*,myself,Jibun,Jibun
Particles,Case particles,General,*,*,*,so,De,De
Verbs to make,Independence,*,*,Five steps, La line,Uninflected word,create,Tsukuru,Tsukuru
EOS
What to do if you are told that you don't have libmecab.so.2.
ImportError: libmecab.so.2: cannot open shared object file: No such file or directory
approach
$ vi /etc/ld.so.conf.d/lib.conf
/usr/local/lib #<--Newly fill in or add.
$ ldconfig #<--Reload
Reference: Extraction of important words from Wikipedia by TF / IDF using Mecab Python http://yut.hatenablog.com/entry/20130215/1360884220
Reference: Make the morphological analysis engine MeCab available in Python 3 (March 2016 version) http://qiita.com/grachro/items/4fbc9bf8174c5abb7bdd#_reference-f17313e8bc66cbbff3ef
Recommended Posts