Imitating chasen.py written by mhagiwara for studying mecab. py was written.
mecab.py
It is assumed that NLTK and nltk_data have been installed and downloaded.
Place the data under nltk_data / corpus
or create a symbolic link.
import nltk
corpora_path = nltk.data.find('corpora/test')
"""
your data must be stored or linked in nltk/corpora
"""
fileids = r'.*\.mecab'
"""
:param corpus name: regular expression or list of corpus name.
:type corpus: list or strings
"""
reader = MeCabCorpusReader(corpora_path, fileids, encoding='utf8')
print reader.raw()
print ', '.join(reader.words())
for w, t in reader.tagged_words():
print w, t
for para in reader.paras():
for sent in para:
for word in sent:
print word
for para in reader.tagged_paras():
for sent in para:
for (word, pos) in sent:
print word, pos
corpus / test
is a directory containing files that have been analyzed by MeCab and has the extension mecab.
The contents of the file look like this.
Plum noun,General,*,*,*,*,Plum,Plum,Plum
Also particles,Particle,*,*,*,*,Also,Mo,Mo
Peach noun,General,*,*,*,*,Peaches,peach,peach
Also particles,Particle,*,*,*,*,Also,Mo,Mo
Peach noun,General,*,*,*,*,Peaches,peach,peach
Particles,Attributive,*,*,*,*,of,No,No
Noun,Non-independent,Adverbs possible,*,*,*,home,Uchi,Uchi
EOS
The output is
raw()
Plum noun,General,*,*,*,*,Plum,Plum,Plum
Also particles,Particle,*,*,*,*,Also,Mo,Mo
Peach noun,General,*,*,*,*,Peaches,peach,peach
Also particles,Particle,*,*,*,*,Also,Mo,Mo
Peach noun,General,*,*,*,*,Peaches,peach,peach
Particles,Attributive,*,*,*,*,of,No,No
Noun,Non-independent,Adverbs possible,*,*,*,home,Uchi,Uchi
EOS
words()
Plum,Also,Alsoも,Also,Alsoも,of,home
tagged_words()
Plum info:noun,General,*,*,*,*,Plum,Plum,Plum
Also info:Particle,Particle,*,*,*,*,Also,Mo,Mo
Peach info:noun,General,*,*,*,*,Peaches,peach,peach
Also info:Particle,Particle,*,*,*,*,Also,Mo,Mo
Peach info:noun,General,*,*,*,*,Peaches,peach,peach
Info:Particle,Attributive,*,*,*,*,of,No,No
Of info:noun,Non-independent,Adverbs possible,*,*,*,home,Uchi,Uchi
paras()
Plum
Also
Peaches
Also
Peaches
of
home
tagged_paras()
Plum info:noun,General,*,*,*,*,Plum,Plum,Plum
Also info:Particle,Particle,*,*,*,*,Also,Mo,Mo
Peach info:noun,General,*,*,*,*,Peaches,peach,peach
Also info:Particle,Particle,*,*,*,*,Also,Mo,Mo
Peach info:noun,General,*,*,*,*,Peaches,peach,peach
Info:Particle,Attributive,*,*,*,*,of,No,No
Of info:noun,Non-independent,Adverbs possible,*,*,*,home,Uchi,Uchi
Recommended Posts