Install the following packages. The OS is ubuntu 16.04.
--python (3.5.0) --Language with many natural language processing libraries --pyenv --python version control package --MeCab (0.996) --Morphological analysis engine --CaboCha (0.69) --Dependency analysis engine --gensim (0.12.4) --A library that can use popular LDA and word2vec
python3,pyenv
For the time being, insert python.
$ sudo apt-get install python
Probably only this will install python2.7, so I will drop pyenv which manages the version of python.
$ git clone https://github.com/yyuu/pyenv.git ~/.pyenv
To use pyenv, add the following script to a shell config file like .zshenv.
export PYENV_ROOT="$HOME/.pyenv"
export PATH=$PATH:$PYENV_ROOT/bin
eval "$(pyenv init -)"
** Addendum (2017-12-11) ** I reversed the order of export. It cannot be done correctly unless PYENV_ROOT is defined first and called when defining PATH.
I'm using zsh, but when I call python from a shell script saved as a file, it becomes python2.7. I wrote all these settings in .zshrc, but if you look closely, .zshrc is a setting that only applies on the stream (when a person types a command), not in a shell script. It seems. .zshenv is a configuration file that is always executed when zsh is started. Write all environment variables in .zshenv.
Let's use pyenv. Check the list of python versions that can be installed.
$ pyenv install -l
After confirming that there is 3.5.0, install python 3.5.0, change the version used, and update. If the final version check shows 3.5.0, it is successful.
$ pyenv install 3.5.0
$ pyenv global 3.5.0
$ pyenv rehash
$ python --version
Then install python's library management tool, pip. It will be used several times in the subsequent settings.
$ sudo apt-get install python-pip
Reference URL Super fast setup guide for Zsh beginners http://qiita.com/uasi/items/c4288dd835a65eb9d709 Minimum memo when using Python on Mac (pyenv edition) http://qiita.com/zaburo/items/dd1a8323633035614efc pyenv + virtualenv (CentOS7) http://qiita.com/saitou1978/items/e82421e29e118bd397cc If you want to use easy_install or pip with Python on Ubuntu http://tech.g.hatena.ne.jp/rx7/20101129/p1
MeCab
Install MeCab and other required packages.
$ sudo apt-get install mecab mecab-ipadic libmecab-dev
If you insert mecab-ipadic, the character code will be utf-8. If libmecab-dev is not included, it will cause anger if mecab-config is not included. Dictionaries that can be used with MeCab include ipadic and juman, but this time we will use mecab-ipadic-neologd. The feature of this dictionary is that it contains many proper nouns, symbols, emoticons, etc. Let's install it with the following command.
$ git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git <Path to save location>
$ cd <Saved location>/mecab-ipadic-neologd
$ ./bin/install-mecab-ipadic-neologd -h
I think the location to save should be the same as the existing dictionary. You can find the location of the dictionary you are currently using with mecab -D
. To use it, use the following command.
mecab -d <save location> / mecab-ipadic-neologd /
Next, bind so that MeCab can be used from python. Use the following command.
$ pip install mecab-python3
If there is no error with the following command, it is successful.
$ python
>>> import MeCab
Reference URL mecab-ipadic-NEologd : Neologism dictionary for MeCab https://github.com/neologd/mecab-ipadic-neologd/blob/master/README.ja.md
CaboCha
I tried to install it with the following command as I did before.
$ sudo apt-get install subversion
$ pip install 'svn+http://cabocha.googlecode.com/svn/trunk/python@r99'
I was angry that I couldn't find the package. I tried various other methods, but in the end I decided to drop it by the method described on the official website. First of all, the library CRF ++ required for cabocha, but I guess it didn't work with wget, so I downloaded it from the link below.
CRF++ https://drive.google.com/folderview?id=0B4y35FiV1wh7fngteFhHQUN2Y1B5eUJBNHZUemJYQV9VWlBUb3JlX0xBdWVZTWtSbVBneU0&usp=drive_web#list
I dropped cabocha itself with wget. The version is 0.67 at the link destination, but let's set it to the latest 0.69.
$ tar zvxf CRF++-0.58.tar.gz
$ cd CRF++-0.58
$ ./configure
$ make
$ sudo make install
$ sudo ldconfig
$ wget http://cabocha.googlecode.com/files/cabocha-0.69.tar.bz2
$ tar xjvf cabocha-0.69.tar.bz
$ cd cabocha-0.69
$ ./configure --with-charset=UTF8 --with-posset=IPA
$ make
$ sudo make install
$ sudo ldconfig
$ cabocha
Next, bind to python3. Since it does not support python3 in the original state, modify setup.py a little. setup.py is under cabocha-69 / python.
setup.py
#Omission
def cmd2(str):
# return string.split (cmd1(str))Delete this line
return cmd1(str).split() #Insert this line
#Omission
After fixing it, install it with the following command.
$ cd cabocha-0.69/python
$ sudo python setup.py build_ext
$ sudo python setup.py install
$ sudo ldconfig
When using cabocha, specify the dictionary as shown in the following command.
cabocha -d <save location> / mecab-ipadic-neologd /
If there is no error with the following command, it is successful.
$ python
>>> import CaboCha
Reference URL CaboCha official website https://taku910.github.io/cabocha/ Cabocha installation notes http://qiita.com/ShingoOikawa/items/ef4ac2929ec19599a3cf I wrote a patch to use CaboCha with python3 http://nosada.hatenablog.com/entry/2014/03/14/002954 Specify dictionary with CaboCha (python) http://studylog.hateblo.jp/entry/2016/01/25/134507
gensim
You can easily install it with the following command. numpy and scipy are libraries required to use gensim.
$ pip install numpy
$ pip install scipy
$ pip install gensim
Check if it can be installed with the following command as in the example.
$ python
>>> import numpy
>>> import scipy
>>> import gensim
Reference URL gensim:installation https://radimrehurek.com/gensim/install.html
This completes the environment settings. Thank you for your hard work.
Most of them referred to the articles I wrote on my own blog before.
Upgrade from python2.7 to 3.5 (NLP flavor) http://woody-kawagoe.hatenablog.com/entry/2016/04/18/222535
I was addicted to it again and wanted to write various things on qiita, so I rewrote it and posted it on qiita.
Recommended Posts