I played with Mecab (morphological analysis)!

Introduction

Since a scene using MeCab came out, I posted it as a memorandum

I tried to summarize from the installation method of MeCab to the output

The item description is as follows

What is MeCab (morphological analysis)?

MeCab [MeCab] 1 is an open source morphological analysis engine developed by the Graduate School of Informatics, Kyoto University. Can be used with perl, ruby, python, java, C #

Morphological analysis

Analysis that decomposes sentences into morphemes based on the grammar of the target language and the part-speech information of words Techniques used as pre-processing in the field of natural language processing *** Morpheme ** ... The smallest unit of meaningful expression element

For example "I'm studying programming using python." Is output as follows

word Part of speech Part of speech細分類 word Part of speech Part of speech細分類
I noun 代noun programming noun Change connection
Is Particle 係Particle To Particle 格Particle
python noun General study noun Change connection
To Particle 格Particle Shi verb Independence
use noun Change connection hand Particle 接続Particle
Shi verb Independence I verb 非Independence
hand Particle 接続Particle Masu Auxiliary verb -
symbol Comma symbol Kuten

How to install MeCab

1 Install mecab-64-0.996.2.exe from [here] 2

2 Execute mecab-64-0.996.2.exe and install MeCab with ** UTF-8 **

セットアップ1.png セットアップ2.png

When the installation is completed, "Create a dictionary" will appear, so execute it as it is

3 Check with CMD whether MeCab can be used properly

mecab.png

If MeCab doesn't respond in CMD, maybe the path isn't working? Add the installed \ Mecab \ bin to the environment variable Path.

環境変数.png

4 Install mecab for python so that it can be used with python

pip install mecab-python-windows

スタート.png

5 Save libmecab.dll in the Mecab folder by overwriting it in the python folder.

The location of the folder is as follows

(File name to copy):libmecab.dll
(Original):C:\Program Files\MeCab\bin
(Copy to):C:\Users\(USER name)\AppData\Local\Programs\Python\Python37\Lib\site-packages

If you search with cmd, you should find something similar.
(Original):where mecab
(Copy to):where python

I actually moved it

Morphological analysis in MeCab is as follows

  1. Specify a dictionary for morphological analysis with MeCab.Tagger ()
  2. Use tagger.parse to morphologically parse strings and text files using the specified dictionary

Very simple

I actually created a simple program that performs morphological analysis from a character string or text file

program

mecab_string.py


import MeCab

CONTENT = "I'm studying programming using python."

tagger = MeCab.Tagger()
parse = tagger.parse(CONTENT)

print(parse)

Don't forget to get the following error if you don't specify the encoding when opening the file!

UnicodeDecodeError: 'cp932' codec can't decode byte 0x81 in position 4: illegal multibyte sequence

sample.txt


I'm studying programming using python.

mecab_read.py


import MeCab

FILE_NAME = "sample.txt"

with open(FILE_NAME, "r", encoding="utf-8") as f:
    CONTENT = f.read()

tagger = MeCab.Tagger()
parse = tagger.parse(CONTENT)

print(parse)

Output result

My noun,Pronoun,General,*,*,*,I,I,I
Is a particle,Particle,*,*,*,*,Is,C,Wow
python noun,General,*,*,*,*,*
Particles,Case particles,General,*,*,*,To,Wo,Wo
Nouns used,Change connection,*,*,*,*,use,Shiyo,Shiyo
Verb,Independence,*,*,Sahen Suru,Continuous form,To do,Shi,Shi
Particles,Connection particle,*,*,*,*,hand,Te,Te
, Symbol,Comma,*,*,*,*,、,、,、
Programming noun,Change connection,*,*,*,*,programming,programming,programming
Particles,Case particles,General,*,*,*,To,Wo,Wo
Study noun,Change connection,*,*,*,*,study,Benkyo,Benkyo
Verb,Independence,*,*,Sahen Suru,Continuous form,To do,Shi,Shi
Particles,Connection particle,*,*,*,*,hand,Te,Te
Verb,Non-independent,*,*,One step,Continuous form,Is,I,I
Auxiliary verb,*,*,*,Special / mass,Uninflected word,Masu,trout,trout
.. symbol,Kuten,*,*,*,*,。,。,。
EOS

Oh, I was able to output properly!

About output format

Earlier, MeCab.Tagger () wrote "a place to specify a dictionary for morphological analysis".

That means that there are multiple dictionaries, so I'll introduce some of them.

There is nothing to install newly, so it may be interesting to try changing the above program etc.

I tried to input all ** morphological analysis **

MeCab.Tagger() MeCab compatible morphological analysis Set to default

Morpheme noun,General,*,*,*,*,morpheme,Keitaiso,Keitaiso
Parsing noun,Change connection,*,*,*,*,analysis,Kaiseki,Kaiseki
EOS

MeCab.Tagger("-Ochasen") ChaSen compatible morphological analysis

Morpheme Keitaiso Morpheme noun-General
Analysis Kaiseki analysis nouns-Change connection
EOS

MeCab.Tagger("-Owakati") Divide the morphological analysis Put a break for each word like in English

Morphological analysis

MeCab.Tagger("-Oyomi") How to read what was morphologically analyzed Output in katakana and English words

Iseki Soca

MeCab.Tagger("-Odump") Morphological analysis that outputs all information

0 BOS BOS/EOS,*,*,*,*,*,*,*,* 0 0 0 0 0 0 2 1 0.000000 0.000000 0.000000 0
7 Morpheme nouns,General,*,*,*,*,morpheme,Keitaiso,Keitaiso 0 9 1285 1285 38 2 0 1 0.000000 0.000000 0.000000 5338
13 Analytical nouns,Change connection,*,*,*,*,analysis,Kaiseki,Kaiseki 9 15 1283 1283 36 2 0 1 0.000000 0.000000 0.000000 9241
20 EOS BOS/EOS,*,*,*,*,*,*,*,* 15 15 0 0 0 0 3 1 0.000000 0.000000 0.000000 8505

MeCab.Tagger("-Osimple") Simple morphological analysis

Morpheme noun-General
Parsing noun-Change connection
EOS

There are many more, so if you are interested, please check out [Official] 1!

in conclusion

Morphological analysis is now possible using MeCab It is said that it will also be used for natural language processing with AI, so I want to master how to use it ... (˘ω˘)

Recommended Posts

I played with Mecab (morphological analysis)!
[Python] Morphological analysis with MeCab
I played with wordcloud!
Japanese morphological analysis with Python
[PowerShell] Morphological analysis with SudachiPy
Collecting information from Twitter with Python (morphological analysis with MeCab)
Text mining with Python ① Morphological analysis
■ [Google Colaboratory] Use morphological analysis (MeCab)
I played with PyQt5 and Python3
I played with DragonRuby GTK (Game Toolkit)
I tried factor analysis with Titanic data!
[Scikit-learn] I played with the ROC curve
[Introduction to Pytorch] I played with sinGAN ♬
Tweet analysis with Python, Mecab and CaboCha
Python: Simplified morphological analysis with regular expressions
[Python] I introduced Word2Vec and played with it.
[Python] I played with natural language processing ~ transformers ~
I tried principal component analysis with Titanic data!
I played with Floydhub for the time being
I tried using mecab with python2.7, ruby2.3, php7
Difference in morphological analysis results by mecab dictionary
I played with Diamond, a metrics collection tool
I tried morphological analysis and vectorization of words
Morphological analysis tool installation (MeCab, Human ++, Janome, GiNZA)
Data analysis with python 2
Text mining with Python ① Morphological analysis (re: Linux version)
Make a morphological analysis bot loosely with LINE + Flask
Basket analysis with Spark (1)
I tried Amazon Comprehend sentiment analysis with AWS CLI.
[OpenCV / Python] I tried image analysis of cells with OpenCV
Use mecab with Python3
Dependency analysis with CaboCha
Voice analysis with python
The first artificial intelligence. I wanted to try natural language processing, so I will try morphological analysis using MeCab with python3.
I tried various things with Python: scraping (Beautiful Soup + Selenium + PhantomJS) and morphological analysis.
Voice analysis with python
Dynamic analysis with Valgrind
Regression analysis with NumPy
Data analysis with Python
Create a bot that only returns the result of morphological analysis with MeCab on Discord
I made a class to get the analysis result by MeCab in ndarray with python
I tried morphological analysis of the general review of Kusoge of the Year
Morphological analysis using Igo + mecab-ipadic-neologd in Python (with Ruby bonus)
I tried fp-growth with python
I tried scraping with Python
[Note] WordCloud from morphological analysis
I wrote GP with numpy
Japanese morphological analysis using Janome
Python: Japanese text: Morphological analysis
Multiple regression analysis with Keras
I tried Learning-to-Rank with Elasticsearch!
I made blackjack with python!
Sentiment analysis with Python (word2vec)
I tried clustering with PyCaret
Texture analysis learned with pyradiomics
Natural language processing 1 Morphological analysis
Planar skeleton analysis with Python
I can't search with # google-map. ..
I measured BMI with tkinter
I tried gRPC with Python
I made COVID19_simulator with JupyterLab