natto-py is a Python package that provides binding with Python-MeCab's Foreign Function Interface (http://en.wikipedia.org/wiki/Foreign_function_interface) (FFI). It supports Python 2 and 3 and has the advantage that the compiler does not need it. * Available on nix, OS X and Windows.
natto-py can use Python 2 and 3 below. The following versions have been proven.
First, install MeCab 0.996.
Install natto-py via pip as you would a regular Python package.
$ pip install natto-py
The cffi package is also required, but the above command will automatically install cffi if needed.
Import the MeCab class from natto to get an instance.
from natto import MeCab
nm = MeCab()
print(nm)
<natto.mecab.MeCab model=<cdata 'mecab_model_t *' 0x802016640>,
tagger=<cdata 'mecab_t *' 0x8020a44c0>,
lattice=<cdata 'mecab_lattice_t *' 0x802079600>,
libpath="/opt/mecab/lib/libmecab.so",
options={},
dicts=[<natto.dictionary.DictionaryInfo
dictionary=<cdata 'mecab_dictionary_info_t *' 0x802079480>,
filepath="/opt/mecab/lib/mecab/dic/ipadic/sys.dic",
charset=utf-8,
type=0>],
version=0.996>
The sentence is parsed for the time being and the result is sent to standard output as a character string.
text = "A hero always appears in a pinch."
print(nm.parse(text))
Pinch noun,General,*,*,*,*,pinch,pinch,pinch
Particles,Attributive,*,*,*,*,of,No,No
Time noun,Non-independent,Adverbs possible,*,*,*,Time,Toki,Toki
Particles,Case particles,General,*,*,*,To,D,D
Is a particle,Particle,*,*,*,*,Is,C,Wow
Be sure to adverb,Particle connection,*,*,*,*,you have to,Canaras,Canaras
Hero noun,General,*,*,*,*,Hero,Hero,Hero
Is a particle,Case particles,General,*,*,*,But,Moth,Moth
Verbs that appear,Independence,*,*,One step,Uninflected word,appear,Allawarel,Allawarel
.. symbol,Kuten,*,*,*,*,。,。,。
EOS
Get the analysis result with MeCabNode
and output more detailed information about each morpheme.
# -F / --node-Specify the output format of the node with the format option
#
# %m ...Morpheme surface sentence
# %f[0] ...Part of speech
# %h ...Part of speech ID(IPADIC)
# %f[8] ...pronunciation
#
with MeCab('-F%m,%f[0],%h,%f[8]') as nm:
for n in nm.parse(text, as_nodes=True):
print(n.feature)
pinch,noun,38,pinch
of,Particle,24,No
Time,noun,66,Toki
To,Particle,13,D
Is,Particle,16,Wow
you have to,adverb,35,Canaras
Hero,noun,38,Hero
But,Particle,13,Moth
appear,verb,31,Allawarel
。,symbol,7,。
EOS
If you use Python with statement, MeCab live even if the context ends normally or an exception occurs. It is recommended as the rally reference is automatically destroyed.
that's all
Recommended Posts