Thing you want to do

In the machine learning environment (Ubuntu 16.04 LTS) launched by GCE, I first installed morphological analysis software in order to perform natural language processing. However, it took a lot of time to install it, so I will leave it as a memorandum.

Installed software libraries

Mecab 0.996
JUMAN++ 1.01
janome 0.2.8

Can be installed only with pip install, janome is omitted

Install Mecab

Install Mecab and dictionary (UTF-8 version)

sudo apt-get install mecab mecab-ipadic-utf8

If you don't include these, mecab-python will not install properly

sudo apt-get install libmecab-dev sudo apt-get install build-essential

Finally, install the library to call Mecab from pthon3.x

pip install mecab-python3

Install JUMAN ++

 I have some necessary packages and can't install them properly, JUMAN ++
 I heard that the ability of morphological analysis is more than Mecab, so I researched various things that I would definitely like to install, and it worked with the following procedure

To use JUMAN ++ first

Install the required packages It takes quite a while

sudo apt install checkinstall auto-apt ccache sudo auto-apt update sudo apt install google-perftools libgoogle-perftools-dev libboost-dev

Download and unzip JUMAN ++

wget http://lotus.kuee.kyoto-u.ac.jp/nl-resource/jumanpp/jumanpp-1.01.tar.xz tar xJvf jumanpp-1.01.tar.xz

Then install JUMAN ++

auto-apt run ./configure CC="ccache gcc" CFLAGS="-O3" CXX="ccache g++" CXXFLAGS="-O3" make sudo checkinstall

Now, when the version comes out as follows, the installation of JUMAN ++ is completed successfully.

jumanpp -v

JUMAN++ 1.01

To use JUMAN ++ from Python

 Installation continues to use JUMAN ++ with Python

Install in the order of JUMAN → KNP → PyKNP, referring to Using JUMAN ++ from Python.

However, isn't it registered in the Python library just for the above? It looks like, so finally execute the following to complete

pip install ./pyknp-0.3

Try morphological analysis

 Try to implement "Right of Foreigners to Vote" in Mecab, JUMAN ++, Janome

For Mecab

import MeCab
mecab = MeCab.Tagger("-Ochasen")
print(mecab.parse("Foreigners to vote"))

Foreign Gaikoku Foreign nouns-General
Carrot carrot carrot noun-General
Administration Seiken Administration Noun-General
EOS

For JUMAN ++

from pyknp import Jumanpp
jumanpp = Jumanpp()
r=jumanpp.analysis("Foreigners to vote")
for m in r.mrph_list():
    print(m.midasi)

Foreign countries
Man
Suffrage
Right

For Janome

from janome.tokenizer import Tokenizer
t = Tokenizer()
tokens = t.tokenize('Foreigners to vote')
for token in tokens:
    print(token)

Foreign noun,General,*,*,*,*,Foreign countries,Gaikoku,Gaikoku
Carrot noun,General,*,*,*,*,carrot,carrot,carrot
Regime noun,General,*,*,*,*,administration,Seiken,Seiken

After all, JUMAN ++ is good.

Referenced site

Text mining with Python ① Morphological analysis (re: Linux version)

[How to install JUMAN ++ on Ubuntu 16.04 LTS] (http://qiita.com/SUZUKI_Masaya/items/29c81d037cdf7d37b900)

[How to install software on Ubuntu using auto-apt, checkinstall, ccache] (http://qiita.com/SUZUKI_Masaya/items/bd03f39e20a1a8f7f4f6#%E5%BF%85%E8%A6%81%E3%81%AA%E3%83%91%E3%83%83%E3%82%B1%E3%83%BC%E3%82%B8%E3%81%AE%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%BC%E3%83%AB)

Use JUMAN ++ from Python

Perform morphological analysis in the machine learning environment launched by GCE