Challenge text mining with Python. (For Python3 series) Follow the steps below.
① Morphological analysis (this article) ② Visualization with Word Cloud (next time)
Last time, I tried to use MeCab on Windows and stumbled on installing Python bindings and gave up, so I switched to Linux and restarted.
(review) To be able to use MeCab in Python ・ Installation of MeCab main unit ・ Installation of dictionary -Install Python bindings Is necessary.
The Windows version came with a dictionary in MeCab itself, but the Linux version needs to be installed separately. However, you can install it together with the package.
Just install with apt. For the dictionary, select the UTF-8 version of IPA (recommended).
sudo apt-get install mecab mecab-ipadic-utf8
As usual, check the operation with "Sumomomo Momomo".
$ mecab
Of the thighs and thighs
Plum noun,General,*,*,*,*,Plum,Plum,Plum
Also particles,Particle,*,*,*,*,Also,Mo,Mo
Peach noun,General,*,*,*,*,Peaches,peach,peach
Also particles,Particle,*,*,*,*,Also,Mo,Mo
Peach noun,General,*,*,*,*,Peaches,peach,peach
Particles,Attributive,*,*,*,*,of,No,No
Noun,Non-independent,Adverbs possible,*,*,*,home,Uchi,Uchi
EOS
Just install this with apt.
sudo apt-get install python-mecab
Let's analyze "Plum ..." from Python.
mecab_sample.py
# coding: utf-8
import sys
import MeCab
mecab = MeCab.Tagger("-Ochasen")
print(mecab.parse("Of the thighs and thighs"))
$ python3 mecab_sample.py
Traceback (most recent call last):
File "mecab_sample.py", line 3, in <module>
import MeCab
ImportError: No module named 'MeCab'
It is said that there is no MeCab ... Try running it with python 2.x.
$ python mecab_sample.py
Plum Sumomo Noun-General
Momo particle-Particle
Peach peach noun-General
Momo particle-Particle
Peach peach noun-General
Nono particle-Attributive
Uchi Uchi Noun-Non-independent-Adverbs possible
EOS
This one works fine. Looking at it, it seems that what I put in with apt only works with Python 2.x series. It seems that it is necessary to bring the source and build it with setup.py as it was done in the Windows version to use it in Python 3 series, but it is also premised on Python 2 series and a patch is required to run it in Python 3 series It seems that you need to hit it, so it seems that it is not straightforward.
Uh, it's a hassle ... I found a article that says it's OK to put a library for Python3 with pip, so I'll try it.
$ pip3 install mecab-python3
Collecting mecab-python3
Using cached mecab-python3-0.7.tar.gz
Complete output from command python setup.py egg_info:
/bin/sh: 1: mecab-config: not found
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-gsw8fi5f/mecab-python3/setup.py", line 41, in <module>
include_dirs=cmd2("mecab-config --inc-dir"),
File "/tmp/pip-build-gsw8fi5f/mecab-python3/setup.py", line 21, in cmd2
return cmd1(strings).split()
File "/tmp/pip-build-gsw8fi5f/mecab-python3/setup.py", line 18, in cmd1
return os.popen(strings).readlines()[0][:-1]
IndexError: list index out of range
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-gsw8fi5f/mecab-python3/
I get an error because there is no mecab-config like in Windows. I didn't specify libmecab-dev because I didn't need it when I first installed MeCab, so it seems that it is not included. Enter with apt.
sudo apt-get install libmecab-dev
Then, use pip to insert the binding for Python3 series.
sudo pip3 install mecab-python3
Then run the sample in Python3.
$ python3 mecab_sample.py
Plum Sumomo Noun-General
Momo particle-Particle
Peach peach noun-General
Momo particle-Particle
Peach peach noun-General
Nono particle-Attributive
Uchi Uchi Noun-Non-independent-Adverbs possible
EOS
I was finally able to do it.
-Morphological analysis engine MeCab can be used with Python3 (March 2016 version) -[\ [Python ] \ Mecab ] How to install mecab in ubuntu environment -How to use MeCab on Ubuntu 14.04 and Python 3