In the machine learning environment (Ubuntu 16.04 LTS) launched by GCE, I first installed morphological analysis software in order to perform natural language processing. However, it took a lot of time to install it, so I will leave it as a memorandum.
Can be installed only with pip install, janome is omitted
Install Mecab and dictionary (UTF-8 version)
sudo apt-get install mecab mecab-ipadic-utf8
If you don't include these, mecab-python will not install properly
sudo apt-get install libmecab-dev sudo apt-get install build-essential
Finally, install the library to call Mecab from pthon3.x
pip install mecab-python3
I have some necessary packages and can't install them properly, JUMAN ++ I heard that the ability of morphological analysis is more than Mecab, so I researched various things that I would definitely like to install, and it worked with the following procedure
Install the required packages It takes quite a while
sudo apt install checkinstall auto-apt ccache sudo auto-apt update sudo apt install google-perftools libgoogle-perftools-dev libboost-dev
Download and unzip JUMAN ++
wget http://lotus.kuee.kyoto-u.ac.jp/nl-resource/jumanpp/jumanpp-1.01.tar.xz tar xJvf jumanpp-1.01.tar.xz
Then install JUMAN ++
auto-apt run ./configure CC="ccache gcc" CFLAGS="-O3" CXX="ccache g++" CXXFLAGS="-O3" make sudo checkinstall
Now, when the version comes out as follows, the installation of JUMAN ++ is completed successfully.
jumanpp -v
JUMAN++ 1.01
Installation continues to use JUMAN ++ with Python
Install in the order of JUMAN → KNP → PyKNP, referring to Using JUMAN ++ from Python.
However, isn't it registered in the Python library just for the above? It looks like, so finally execute the following to complete
pip install ./pyknp-0.3
Try to implement "Right of Foreigners to Vote" in Mecab, JUMAN ++, Janome
import MeCab
mecab = MeCab.Tagger("-Ochasen")
print(mecab.parse("Foreigners to vote"))
Foreign Gaikoku Foreign nouns-General
Carrot carrot carrot noun-General
Administration Seiken Administration Noun-General
EOS
from pyknp import Jumanpp
jumanpp = Jumanpp()
r=jumanpp.analysis("Foreigners to vote")
for m in r.mrph_list():
print(m.midasi)
Foreign countries
Man
Suffrage
Right
from janome.tokenizer import Tokenizer
t = Tokenizer()
tokens = t.tokenize('Foreigners to vote')
for token in tokens:
print(token)
Foreign noun,General,*,*,*,*,Foreign countries,Gaikoku,Gaikoku
Carrot noun,General,*,*,*,*,carrot,carrot,carrot
Regime noun,General,*,*,*,*,administration,Seiken,Seiken
After all, JUMAN ++ is good.
Text mining with Python ① Morphological analysis (re: Linux version)
[How to install JUMAN ++ on Ubuntu 16.04 LTS] (http://qiita.com/SUZUKI_Masaya/items/29c81d037cdf7d37b900)
[How to install software on Ubuntu using auto-apt, checkinstall, ccache] (http://qiita.com/SUZUKI_Masaya/items/bd03f39e20a1a8f7f4f6#%E5%BF%85%E8%A6%81%E3%81%AA%E3%83%91%E3%83%83%E3%82%B1%E3%83%BC%E3%82%B8%E3%81%AE%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%BC%E3%83%AB)
Recommended Posts