Mac OS 10.9.4 Python 2.7
MeCab is required to use CaboCha
The latest version at the time of writing is 0.58 http://crfpp.googlecode.com/svn/trunk/doc/index.html#download
Unzip
$ cd CRF++-0.58
$ ./configure
$ make
$ make install
$ cd python
$ sudo python setup.py install
The latest version at the time of writing is 0.996 https://code.google.com/p/mecab/
From Downloads
--mecab-0.996.tar.gz (mecab body)
Download and unzip.
$ cd mecab-0.996
$ ./configure
$ make
$ sudo make install
$ cd ..
$ cd mecab-python-0.996
$ sudo python setup.py install
$ cd ..
$ cd mecab-ipadic-2.7.0-20070801
$ ./configure
$ make
$ sudo make install
There is no such file or directory
If you are told
#### **`sudo apt-get install python2.7-dev`**
```7-dev
Try to move
```bash
$ mecab
I'm Sakamoto and
Sakamoto? ????,????,*,*,*,*,*
?? ̾??,??ͭ̾??,?ȿ?,*,*,*,*
??But????,????,*,*,*,*,*
EOS
The characters are garbled. It seems that it is not utf-8 if the character code of the dictionary is the default.
Move to the mecab-ipadic directory and reconfigure to UTF-8. Make clean and then reconfigure
$ make clean
$ ./configure --with-charset=utf8
$ make
$ sudo make install
Ubuntu
libmecab.so.2: cannot open shared object file: No such file or directory
If you are told
```sudo ldconfig ```
It seems to be good
Try using it.
```bash
$ mecab
I'm Sakamoto and
Sakamoto noun,Proper noun,Personal name,Surname,*,*,Sakamoto,Sakamoto,Sakamoto
Auxiliary verb,*,*,*,Special Death,Uninflected word,is,death,death
Is a particle,Connection particle,*,*,*,*,But,Moth,Moth
EOS
fixed.
By the way, the setting of mecab is written in mecabrc.
$ sudo find / -name "mecabrc"
/usr/local/etc/mecabrc
$ sudo emacs /usr/local/etc/mecabrc
It was like this by default
;
; Configuration file of MeCab
;
; $Id: mecabrc.in,v 1.3 2006/05/29 15:36:08 taku-ku Exp $;
;
dicdir = /usr/local/lib/mecab/dic/ipadic
; userdic = /home/foo/bar/user.dic
; output-format-type = wakati
; input-buffer-size = 8192
; node-format = %m\n
; bos-format = %S\n
; eos-format = EOS\n
It seems that dicdir is a directory of dictionary data.
The latest version at the time of writing is 0.68 https://code.google.com/p/cabocha/
Download and unzip cabocha-0.68.tar.bz2 from Downloads
$ cd cabocha-0.68
$ ./configure
$ make
$ sudo make install
$ cd pythin
$ sudo python setup.py install
import MeCab
mt = MeCab.Tagger("-Ochasen")
print mt.parse("I'm Sakamoto and")
Sakamoto Sakamoto Noun Sakamoto-Proper noun-Personal name-Surname
It's death. Auxiliary verb special / death basic form
Ga ga ga particle-Connection particle
EOS
It's annoying that I can't go unless I'm very careful about the character code.
# coding: utf-8
import MeCab
mt = MeCab.Tagger("mecabrc")
res = mt.parseToNode("I'm Sakamoto and")
while res:
print res.surface
print res.feature
res = res.next
BOS/EOS,*,*,*,*,*,*,*,*
Sakamoto
noun,固有noun,Personal name,Surname,*,*,Sakamoto,Sakamoto,Sakamoto
is
Auxiliary verb,*,*,*,Special Death,Uninflected word,is,death,death
But
Particle,接続Particle,*,*,*,*,But,Moth,Moth
BOS/EOS,*,*,*,*,*,*,*,*
There were many implementations that split res.feature with ",", but I wonder if there is no choice but to do so. Well, it doesn't seem to be a problem, so I'll try it.
# coding: utf-8
import MeCab
mt = MeCab.Tagger("mecabrc")
res = mt.parseToNode("I'm Sakamoto and")
while res:
print res.surface
arr = res.feature.split(",")
print "Part of speech: " + arr[0]
res = res.next
Part of speech: BOS/EOS
Sakamoto
Part of speech:noun
is
Part of speech:Auxiliary verb
But
Part of speech:Particle
Part of speech: BOS/EOS
If you don't set res = res.next instead of res.next, you will naturally loop infinitely. I'm addicted to using Java.
I will write it again if I try using.
Recommended Posts