Ciao ... †
Personally, I've been getting more and more opportunities to use MeCab with Python on Windows these days. However, in order to install MeCab's Python wrapper on Windows, you have to download the source, rewrite setup.py, and install the compiler, which is very troublesome.
So, we have released something that makes it easy to use MeCab's Python wrapper with pip on Windows, mac, and Ubuntu! https://pypi.org/project/mecab/
It is a MeCab wrapper that supports various OSs in one package by changing the behavior depending on the OS at the time of installation. For Windows, for example, Microsoft Visual Studio's C ++ compiler builds mecab-python and puts it in wheel format. On the other hand, in the case of mac and Linux, the C ++ code for binding is compiled, so it cannot be installed unless the target computer has a C ++ compiler.
It currently supports Python 2.7, 3.6, 3.7, 3.8. All versions support both 32-bit and 64-bit. It has been tested on Windows 10, macOS 10.14 and Ubuntu 18.04.
However, it is assumed that the 64-bit version of Python for Windows has the following stray build 64-bit version of MeCab installed. https://github.com/ikegami-yukino/mecab/releases
Also, since the Windows version of Cabocha is distributed only in 32-bit binaries, ** If you want to use it in combination with Cabocha on Windows, please use the 32-bit version of Python. ** (Sorry for being complicated)
--Since it is distributed in wheel format on PyPI, it can be used on Windows without a C ++ compiler.
--Can be used in common on Windows, macOS, Linux, etc.
--Since the interface is exactly the same as the official Python binding, there is no need to rewrite existing code.
--Basically the same as the official Python binding, so processing is fast
--Fixed official Python binding bug
--No need to rewrite setup.py
(Official Python binding requires rewriting setup.py
to support Python 3)
--Supports all MeCab dictionaries
--Extra items such as SWIG and MeCab dictionaries are not included
$ pip install mecab
Or
$ python -m pip install mecab
You can put it in with.
If you have old Python 2.7 without pip, download get-pip.py and run it in Python to get pip.
If you get an error like MeCab_wrap.cxx: 178: 11: fatal error:'Python.h' file not found
$ CPLUS_INCLUDE_PATH=`python-config --prefix`/Headers:$CPLUS_INCLUDE_PATH pip install mecab
Please try.
>>> import MeCab
>>> t = MeCab.Tagger()
>>> sentence = "Taro gave this book to a woman."
>>> print(t.parse(sentence))
Taro noun,Proper noun,Personal name,Name,*,*,Taro,Taro,Taro
Is a particle,Particle,*,*,*,*,Is,C,Wow
This adnominal adjective,*,*,*,*,*,this,this,this
Book noun,General,*,*,*,*,Book,Hong,Hong
Particles,Case particles,General,*,*,*,To,Wo,Wo
Feminine noun,General,*,*,*,*,Female,Josei,Josei
Particles,Case particles,General,*,*,*,To,D,D
Passing verb,Independence,*,*,Godan / Sa line,Continuous form,hand over,I,I
Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
.. symbol,Kuten,*,*,*,*,。,。,。
EOS
>>> n = t.parseToNode(sentence)
>>> while n:
>>> print(n.surface, "\t", n.feature)
>>> n = n.next
BOS/EOS,*,*,*,*,*,*,*,*
Taro noun,Proper noun,Personal name,Name,*,*,Taro,Taro,Taro
Is a particle,Particle,*,*,*,*,Is,C,Wow
This adnominal adjective,*,*,*,*,*,this,this,this
Book noun,General,*,*,*,*,Book,Hong,Hong
Particles,Case particles,General,*,*,*,To,Wo,Wo
Feminine noun,General,*,*,*,*,Female,Josei,Josei
Particles,Case particles,General,*,*,*,To,D,D
Passing verb,Independence,*,*,Godan / Sa line,Continuous form,hand over,I,I
Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
.. symbol,Kuten,*,*,*,*,。,。,。
BOS/EOS,*,*,*,*,*,*,*,*
This is an example of code for IPA dictionary and mecab-ipadic-neologd dictionary.
#When using a dictionary such as NEologd"-d"Specify the dictionary directory with
t = MeCab.Tagger("-d /path/to/dic/mecab-ipadic-neologd")
t = MeCab.Tagger("-O wakati")
print(t.parse(sentence).rstrip())
#=>Taro handed this book to a woman.
NEologd is recommended as a dictionary because it has abundant proper nouns.
t = MeCab.Tagger("-d /usr/local/lib/mecab/dic/mecab-ipadic-neologd -F%m\\t -E\\n")
print(t.parse("I want to go live with DIR EN GRAY").rstrip().split("\t"))
#=>['DIR EN GREY', 'of', 'live', 'To go', 'Want']
#To get the reading"-O yomi"
t = MeCab.Tagger("-O yomi")
print(t.parse(sentence).rstrip())
#=>Taro Hakonohonwo Josei Niwatashita.
t = MeCab.Tagger("-F%f[7]\\t -E\\n -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd")
print(t.parse(sentence).rstrip().split("\t"))
#=>['Taro', 'C', 'this', 'Hong', 'Wo', 'Josei', 'D', 'I', 'Ta', '。']
CONTENT_WORD_POS = ("noun", "verb", "adjective", "adverb")
IGNORE = ("suffix", "Non-independent", "Pronoun")
def is_content_word(feature):
return feature.startswith(CONTENT_WORD_POS) and all(f not in IGNORE for f in feature.split(",")[:6])
t = MeCab.Tagger()
n = t.parseToNode(sentence)
content_words = []
function_words = []
while n:
if is_content_word(n.feature):
content_words.append((n.surface, n.feature))
elif not n.feature.startswith("BOS/EOS,"):
function_words.append((n.surface, n.feature))
n = n.next
print(content_words) #Content word
#=> [('Taro', 'noun,固有noun,Personal name,Name,*,*,Taro,Taro,Taro'), ('Book', 'noun,General,*,*,*,*,Book,Hong,Hong'), ('Female', 'noun,General,*,*,*,*,Female,Josei,Josei'), ('Pass', 'verb,Independence,*,*,Godan / Sa line,Continuous form,hand over,I,I')]
print(function_words) #Function words
#=> [('Is', 'Particle,係Particle,*,*,*,*,Is,C,Wow'), ('this', 'Adnominal adjective,*,*,*,*,*,this,this,this'), ('To', 'Particle,格Particle,General,*,*,*,To,Wo,Wo'), ('To', 'Particle,格Particle,General,*,*,*,To,D,D'), ('Ta', 'Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta'), ('。', 'symbol,Kuten,*,*,*,*,。,。,。')]
t = MeCab.Tagger()
n = t.parseToNode("I handed over a very good book")
lemma = []
while n:
if not n.feature.startswith("BOS/EOS,"):
lemma.append(n.feature.split(",")[6])
n = n.next
print(lemma)
#=> ['Wow', 'Good', 'Book', 'To', 'hand over', 'Ta']
See past articles for constrained analysis. https://qiita.com/yukinoi/items/4e7afb5e72b3a46da0f2
If you like it, I'd appreciate it if you could star mecab's GitHub repository. With just one click, my development motivation is up.
Recommended Posts