Introduction

There are cases where I want to use an existing text analysis function written in another language for text analysis, so I tried it. Let's call a Python-based document initialization tool called neologdn from MATLAB. I'm new to Python, so I'm sorry if I make a lot of mistakes.

environment

MATLAB R2020a Python 3.6

procedure

There is an official page called "Calling Python Library Functions", so prepare by referring to this. Both MATLAB and Python environments are required, but even if you say Python in a word, there is one that supports calling from MATLAB, and that seems to be easier, so as per the official page I installed it.

Enter the following on the MATLAB side as a trial.

`MATLAB`


py.os.listdir('.')

Then, I was able to display the list of files using os.listdir on the Python side.

Next, prepare to use neologdn, a tool that normalizes Japanese.

neologdn is a Japanese text normalizer for mecab-neologd. The normalization is based on the neologd's rules: https://github.com/neologd/mecab-ipadic-neologd/wiki/Regexp.ja

Install neologd.

`command prompt`


py -m pip install neologdn

You are now ready.

Let's run the example sentence in the neologd readme in MATLAB.

`MATLAB`


>> py.neologdn.normalize("Hankaku Kana")

ans = 

Python str has no properties.

Handkerchief

>> py.neologdn.normalize("Double-byte symbol! ?? @ #")

ans = 

Python str has no properties.

Double-byte symbol!?@#

>> py.neologdn.normalize("Double-byte symbol exception "・"")

ans = 

Python str has no properties.

Double-byte symbol exception "・"

>> py.neologdn.normalize("Long vowel shortening way")

ans = 

Python str has no properties.

Long vowel shortening way

>> py.neologdn.normalize("Tilde Delete We~~ ∾ ~ 〰 ~ i")

ans = 

Python str has no properties.

Tilde removal way

>> py.neologdn.normalize("Various hyphens ˗֊ ------ – ⁃⁻₋−")

ans = 

Python str has no properties.

Various hyphens-

>> py.neologdn.normalize("PRML supplementary reading book")

ans = 

Python str has no properties.

PRML supplementary reader

>> py.neologdn.normalize(" Natural Language Processing ")

ans = 

Python str has no properties.

    Natural Language Processing

>> py.neologdn.normalize("Cute good good good", pyargs('repeat',6))

ans = 

Python str has no properties.

Cute good good

>> py.neologdn.normalize("Waste Waste Waste Waste", pyargs('repeat',1))

ans = 

Python str has no properties.

Waste

>>

You can process it according to the readme. By the way, the result seems to be returned in str type.

Before dividing it into tokens with Text Analytics Toolbox, it would be convenient to be able to normalize it like this.

Call Python library for text normalization from MATLAB

Introduction

environment

procedure

MATLAB

command prompt

MATLAB

`MATLAB`

`command prompt`

`MATLAB`