I would like to summarize some Japanese preprocessing that has natural language processing. (Scheduled to be updated at any time)
>>> import unicodedata
>>>
>>> text =u'1994'
>>> print unicodedata.normalize(‘NFKC’, text)
1994
I think most people parse Japanese with mecab.
And I think that there are many people who use neologd as a dictionary, but there is one I found using this dictionary.
$ mecab -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd
cloud
Cloud noun,Proper noun,General,*,*,*,cloud~,Spider spider,Spider spider
EOS
Spider Koyakuso Kunobasho ...? When I looked it up, it was an anime movie directed by Makoto Shinkai.
Recommended Posts