Shows how to build and install the morphological analysis software MeCab, its dictionary, and the library (binding) for Python from the source code as a ** general user ** on a Linux machine that does not have administrator privileges. The version of MeCab is v0.996, the version of Python is 2.7, and the IPA dictionary recommended by the creator of MeCab is used as the dictionary.
Those who can handle the minimum Linux commands.
The author is not responsible for any damages caused by referring to this article. (All responsibility belongs to the reader.)
In addition, we do not guarantee the validity of the content of the article. If there are any points that need to be corrected, please let us know in the comments.
MeCab: Yet Another Part-of-Speech and Morphological Analyzer
From the link above, download a total of three files shown in the list below. Also, save the downloaded file in any directory (here ~ / src /
) directly under your home directory.
The third file is optional and does not necessarily have to be installed, but the natural language processing programs in the book often use MeCab from Python and may be needed to run them. There is. (In addition to Python, libraries for other languages such as Ruby and Java are also available, but I will omit them here.)
Next, go to ~ / src /
and unpack and unpack the .tar.gz file you downloaded earlier.
$ cd ~/src
$ tar zxfv mecab-0.996.tar.gz
$ tar zxfv mecab-ipadic-2.7.0-20070801.tar.gz
$ tar zxfv mecab-python-0.996.tar.gz
If you install with root privileges, MeCab will be installed under / usr / local /
by default. However, the installation will fail with general user privileges.
In such cases, you can install it under the desired directory {local}
by giving the .configure
script the--prefix = {local}
option. Please read the part of {local}
as appropriate. For example, in my case, I decided to create a directory named local
in my home directory and install MeCab etc. under this directory. (It is also common to specify the home directory more simply, but this time I decided to put it under ~ / local
to prevent the directory structure from becoming complicated.)
At this time, ** absolute path ** must be specified for {local}
. An absolute path is a path taken from the root directory, such as / home / {username} / local
.
Specifically, execute the following command. The character code used by MeCab is specified in UTF-8 with the --with-charset
option.
$ mkdir {local}
$ cd ~/src/mecab-0.996
$ ./configure --prefix={local} --with-charset=utf8
$ make
$ make install
If there are no errors, the installation of MeCab itself is complete.
Then install the IPA dictionary. ** If you use MeCab without the dictionary registered, an error will occur **, so be sure to do this.
Execute the following command.
$ cd ~/src/mecab-ipadic-2.7.0-20070801
$ ./configure --with-mecab-config={local}/bin/mecab-config --prefix={local} --with-charset=utf8
$ make
$ make install
Next, set the environment variables. The following is a setting example in C shell. Change it according to the existing settings.
~/.cshrc
setenv PATH {local}/bin:$PATH
After saving the file, close the text editor to reflect the changes to .cshrc
.
$ source ~/.cshrc
This completes the installation of the entire MeCab. To see if it works, run the following command:
$ mecab
Hello, It's nice weather today.
Then, the following morphological analysis results will be obtained.
Hello interjection,*,*,*,*,*,Hello,Hello,Hello
, Symbol,Comma,*,*,*,*,、,、,、
Noun today,Adverbs possible,*,*,*,*,today,today,Kyo
Is a particle,Particle,*,*,*,*,Is,C,Wow
Good adjective,Independence,*,*,Adjective, good,Uninflected word,Good,good,good
Weather noun,General,*,*,*,*,weather,weather,weather
Auxiliary verb,*,*,*,Special Death,Uninflected word,is,death,death
Ne assistant,Final particle,*,*,*,*,Ne,Ne,Ne
.. symbol,Kuten,*,*,*,*,。,。,。
EOS
When you see the output, press Ctrl + c
to exit the command. If the output is garbled, it is possible that you did not specify --with-charset = utf8
correctly when executing" ./configure "in the dictionary, or the character code of the shell is other than UTF-8. There is. In the former case, please install the dictionary again.
First, move the directory.
cd ~/src/mecab-python-0.996
Then rewrite setup.py
in any text editor. ** Rewrite all mecab-config
on lines 13,18,19,20 to {local} / bin / mecab-config
. ** **
Then run the setup Python script.
$ python setup.py build
$ python setup.py install --prefix={local}
Next, set the environment variables. For C shell, add the following two lines to ~ / .cshrc
.
~/.cshrc
setenv LD_LIBRARY_PATH {local}/lib:${LD_LIBRARY_PATH}
setenv PYTHONPATH {local}/lib/python2.7/site-packages:${PYTHONPATH}
At this time, if an error such as PYTHONPATH: Undefined variable.
appears, delete the part: $ {PYTHONPATH}
and try again, and change it according to the existing environment. * (Please point out in the comments if there is a better way) *
Save the changes in the file, close the text editor, and then reflect the changes to .cshrc
.
$ source ~/.cshrc
Now the path to the library is in place and MeCab is available from Python. For confirmation, run ~ / src / mecab-python-0.996 / test.py
.
$ cd ~/src/mecab-python-0.996/
$ python test.py
0.996
Taro noun,Proper noun,Personal name,Name,*,*,Taro,Taro,Taro
Is a particle,Particle,*,*,*,*,Is,C,Wow
This adnominal adjective,*,*,*,*,*,this,this,this
Book noun,General,*,*,*,*,Book,Hong,Hong
Particles,Case particles,General,*,*,*,To,Wo,Wo
Two nouns,number,*,*,*,*,two,D,D
Ro noun,General,*,*,*,*,Ro,Rowe,Low
Particles,Case particles,General,*,*,*,To,Wo,Wo
Verb,Independence,*,*,One step,Continuous form,to see,Mi,Mi
Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
Feminine noun,General,*,*,*,*,Female,Josei,Josei
Particles,Case particles,General,*,*,*,To,D,D
Passing verb,Independence,*,*,Godan / Sa line,Continuous form,hand over,I,I
Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
.. symbol,Kuten,*,*,*,*,。,。,。
EOS
BOS/EOS,*,*,*,*,*,*,*,*
Taro noun,Proper noun,Personal name,Name,*,*,Taro,Taro,Taro
Is a particle,Particle,*,*,*,*,Is,C,Wow
This adnominal adjective,*,*,*,*,*,this,this,this
Book noun,General,*,*,*,*,Book,Hong,Hong
Particles,Case particles,General,*,*,*,To,Wo,Wo
Two nouns,number,*,*,*,*,two,D,D
Ro noun,General,*,*,*,*,Ro,Rowe,Low
Particles,Case particles,General,*,*,*,To,Wo,Wo
Verb,Independence,*,*,One step,Continuous form,to see,Mi,Mi
Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
Feminine noun,General,*,*,*,*,Female,Josei,Josei
Particles,Case particles,General,*,*,*,To,D,D
Passing verb,Independence,*,*,Godan / Sa line,Continuous form,hand over,I,I
Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
.. symbol,Kuten,*,*,*,*,。,。,。
BOS/EOS,*,*,*,*,*,*,*,*
EOS
EOS
filename: {local}/lib/mecab/dic/ipadic/sys.dic
charset: utf8
size: 392126
type: 0
lsize: 1316
rsize: 1316
version: 102
If you get the above output, the installation is complete.
-Installing MeCab (Laboratory) --hase's diary --This is the base of this article. If the installation doesn't work, please refer to the link.
Recommended Posts