Since a scene using MeCab came out, I posted it as a memorandum
I tried to summarize from the installation method of MeCab to the output
The item description is as follows
MeCab [MeCab] 1 is an open source morphological analysis engine developed by the Graduate School of Informatics, Kyoto University. Can be used with perl, ruby, python, java, C #
Analysis that decomposes sentences into morphemes based on the grammar of the target language and the part-speech information of words Techniques used as pre-processing in the field of natural language processing *** Morpheme ** ... The smallest unit of meaningful expression element
For example "I'm studying programming using python." Is output as follows
word | Part of speech | Part of speech細分類 | word | Part of speech | Part of speech細分類 | |
---|---|---|---|---|---|---|
I | noun | 代noun | programming | noun | Change connection | |
Is | Particle | 係Particle | To | Particle | 格Particle | |
python | noun | General | study | noun | Change connection | |
To | Particle | 格Particle | Shi | verb | Independence | |
use | noun | Change connection | hand | Particle | 接続Particle | |
Shi | verb | Independence | I | verb | 非Independence | |
hand | Particle | 接続Particle | Masu | Auxiliary verb | - | |
、 | symbol | Comma | 。 | symbol | Kuten |
1 Install mecab-64-0.996.2.exe from [here] 2
2 Execute mecab-64-0.996.2.exe and install MeCab with ** UTF-8 **
When the installation is completed, "Create a dictionary" will appear, so execute it as it is
3 Check with CMD whether MeCab can be used properly
If MeCab doesn't respond in CMD, maybe the path isn't working? Add the installed \ Mecab \ bin to the environment variable Path.
4 Install mecab for python so that it can be used with python
pip install mecab-python-windows
5 Save libmecab.dll in the Mecab folder by overwriting it in the python folder.
The location of the folder is as follows
(File name to copy):libmecab.dll
(Original):C:\Program Files\MeCab\bin
(Copy to):C:\Users\(USER name)\AppData\Local\Programs\Python\Python37\Lib\site-packages
If you search with cmd, you should find something similar.
(Original):where mecab
(Copy to):where python
Morphological analysis in MeCab is as follows
Very simple
I actually created a simple program that performs morphological analysis from a character string or text file
mecab_string.py
import MeCab
CONTENT = "I'm studying programming using python."
tagger = MeCab.Tagger()
parse = tagger.parse(CONTENT)
print(parse)
Don't forget to get the following error if you don't specify the encoding when opening the file!
UnicodeDecodeError: 'cp932' codec can't decode byte 0x81 in position 4: illegal multibyte sequence
sample.txt
I'm studying programming using python.
mecab_read.py
import MeCab
FILE_NAME = "sample.txt"
with open(FILE_NAME, "r", encoding="utf-8") as f:
CONTENT = f.read()
tagger = MeCab.Tagger()
parse = tagger.parse(CONTENT)
print(parse)
My noun,Pronoun,General,*,*,*,I,I,I
Is a particle,Particle,*,*,*,*,Is,C,Wow
python noun,General,*,*,*,*,*
Particles,Case particles,General,*,*,*,To,Wo,Wo
Nouns used,Change connection,*,*,*,*,use,Shiyo,Shiyo
Verb,Independence,*,*,Sahen Suru,Continuous form,To do,Shi,Shi
Particles,Connection particle,*,*,*,*,hand,Te,Te
, Symbol,Comma,*,*,*,*,、,、,、
Programming noun,Change connection,*,*,*,*,programming,programming,programming
Particles,Case particles,General,*,*,*,To,Wo,Wo
Study noun,Change connection,*,*,*,*,study,Benkyo,Benkyo
Verb,Independence,*,*,Sahen Suru,Continuous form,To do,Shi,Shi
Particles,Connection particle,*,*,*,*,hand,Te,Te
Verb,Non-independent,*,*,One step,Continuous form,Is,I,I
Auxiliary verb,*,*,*,Special / mass,Uninflected word,Masu,trout,trout
.. symbol,Kuten,*,*,*,*,。,。,。
EOS
Oh, I was able to output properly!
Earlier, MeCab.Tagger () wrote "a place to specify a dictionary for morphological analysis".
That means that there are multiple dictionaries, so I'll introduce some of them.
There is nothing to install newly, so it may be interesting to try changing the above program etc.
I tried to input all ** morphological analysis **
MeCab.Tagger() MeCab compatible morphological analysis Set to default
Morpheme noun,General,*,*,*,*,morpheme,Keitaiso,Keitaiso
Parsing noun,Change connection,*,*,*,*,analysis,Kaiseki,Kaiseki
EOS
MeCab.Tagger("-Ochasen") ChaSen compatible morphological analysis
Morpheme Keitaiso Morpheme noun-General
Analysis Kaiseki analysis nouns-Change connection
EOS
MeCab.Tagger("-Owakati") Divide the morphological analysis Put a break for each word like in English
Morphological analysis
MeCab.Tagger("-Oyomi") How to read what was morphologically analyzed Output in katakana and English words
Iseki Soca
MeCab.Tagger("-Odump") Morphological analysis that outputs all information
0 BOS BOS/EOS,*,*,*,*,*,*,*,* 0 0 0 0 0 0 2 1 0.000000 0.000000 0.000000 0
7 Morpheme nouns,General,*,*,*,*,morpheme,Keitaiso,Keitaiso 0 9 1285 1285 38 2 0 1 0.000000 0.000000 0.000000 5338
13 Analytical nouns,Change connection,*,*,*,*,analysis,Kaiseki,Kaiseki 9 15 1283 1283 36 2 0 1 0.000000 0.000000 0.000000 9241
20 EOS BOS/EOS,*,*,*,*,*,*,*,* 15 15 0 0 0 0 3 1 0.000000 0.000000 0.000000 8505
MeCab.Tagger("-Osimple") Simple morphological analysis
Morpheme noun-General
Parsing noun-Change connection
EOS
There are many more, so if you are interested, please check out [Official] 1!
Morphological analysis is now possible using MeCab It is said that it will also be used for natural language processing with AI, so I want to master how to use it ... (˘ω˘)
Recommended Posts