It is assumed that MeCab is already installed.
pip install igo-python
mecab-ipadic-neologd
and run ./bin/install-mecab-ipadic-neologd
. Then you will have a build directory.to
mecab-ipadic-neologd / build / mecab-ipadic-2.7.0-20070801-neologd-20150401 and execute the following command
java -cp igo- 0.4.5.jar net.reduls.igo.bin.BuildDic neologd. "utf-8" `That's it. I will try to see if it worked.
Python 2.7.8 (default, Mar 31 2015, 12:51:47)
Type "copyright", "credits" or "license" for more information.
IPython 3.0.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import igo
In [2]: t = igo.tagger.Tagger('neologd') #Java earlier~Path to the directory created in
In [3]: for i in t.parse(u'Apple will release the Apple Watch domestically on April 24th.'):
...: print i.surface
...:
Apple
Is
Apple Watch
To
April 24
To
Domestic
Release
Shi
Masu
。
You can't get Apple Watch all at once with regular MeCab, but thanks to mecab-ipadic-neologd, you can get it. This time I ran it in the directory where neologd was created, so there was no problem, but as I wrote in the comment, when actually using it, you need to pass the path to the created neolog directory.
This is convenient because you can flexibly analyze morphological elements without installing MeCab.
In the directory where igo-0.4.5.jar is located
java -cp igo-0.4.5.jar net.reduls.igo.bin.BuildDic path to the directory storage directory Character code to the buid directory in mecab-ipadic-neologd
ʻException in thread "main" java.lang.OutOfMemoryError: If you get an error like Java heap space, add
-Xmx1024m` to the options. I don't know the details, but it seems that the heap is not enough, so I would like to see it by specifying the size.
java -Xmx1024m -cp igo-0.4.5.jar net.reduls.igo.bin.BuildDic neologd . "utf-8"
I referred to here, but I got the same error on 1024, so I managed to double it to 2048. Then the error disappeared.
I referred to the following article. Thank you very much.
Recommended Posts