SAMPLE

My noun,Pronoun,General,*,*,*,I,I,I
Particles,Attributive,*,*,*,*,of,No,No
Sister noun,General,*,*,*,*,sister,Ane,Ane
Is a particle,Particle,*,*,*,*,Is,C,Wow
Ryunosuke Akutagawa noun,Proper noun,Writer,*,*,*,Ryunosuke Akutagawa,Ryunosuke Akutagawa,Actagawa Ryunosuke
Particles,Attributive,*,*,*,*,of,No,No
This noun,General,*,*,*,*,Book,Hong,Hong
Particles,Case particles,General,*,*,*,To,Wo,Wo
Often adverbs,General,*,*,*,*,Often,Yoku,Yoku
Reading verb,Independence,*,*,Five steps, Ma line,Continuous connection,Read,Young,Young
Particles,Connection particle,*,*,*,*,so,De,De
Verb,Non-independent,*,*,One step,Uninflected word,Is,Il,Il
.. symbol,Kuten,*,*,*,*,。,。,。
 BOS/EOS,*,*,*,*,*,*,*,*

REFERENCE How to add vocabulary to MeCab dictionary [Windows 10, Ubuntu 18.04]

Add a new word to a user-defined dictionary

Prepare a dictionary

Prepare a dictionary as utf-8 in the csv file. Directory: C: \ Users \ username \ Desktop \ MeCabUserDic File name: test_dic.csv

Ryunosuke Akutagawa,,,5543,noun,固有noun,Writer,*,*,*,Ryunosuke Akutagawa,Ryunosuke Akutagawa,Actagawa Ryunosuke
Osamu Dazai,,,5543,noun,固有noun,Writer,*,*,*,Osamu Dazai,Osamu Dazai,Dazaio Sam

Surface form, left context ID, right context ID, cost, part of speech, part of speech subclassification 1, part of speech subclassification 2, part of speech subclassification 3, inflected type, inflected form, prototype, reading, pronunciation

The left context ID and right context ID are the internal IDs when the corresponding words are counted from the left and right, respectively. It seems that it is okay if it is empty because it is given automatically, but I got an error (and garbled characters), so I assigned an appropriate value.

Give the cost the same score as the words that appear with similar frequency. The lower the cost, the easier it is to detect.

Compile user dictionary

Run MeCab \ dic \ ipadic \ mecab-dict-index. When I run it at the normal command prompt, it says permission denied. Start a command prompt with administrator privileges with the following command.

powershell start-process cmd -verb runas

Create a new dic file based on the csv file prepared by the following command.

mecab-dict-index -t utf-8 -t utf-8 -d "<MeCab dictionary directory path>" -u <The path of the directory to create a new dic file> <Path of defined dictionary csv file>

The above command example is below.

mecab-dict-index -f utf-8 -t utf-8 -d "C:\Program Files\MeCab\dic\ipadic" -u C:\Users\yuri.kinoshita\Desktop\MeCabUserDic\test.dic C:\Users\yuri.kinoshita\Desktop\test_dic.csv

This is the execution result. done!

reading C:\Users\yuri.kinoshita\Desktop\MeCabUserDic\test_dic.csv ... 2
emitting double-array: 100% |###########################################|

done!

HOW TO USE

import MeCab

mecab = MeCab.Tagger (r"-Ochasen -u C:\Users\yuri.kinoshita\Desktop\MeCabUserDic\test.dic")

text = "My sister often reads Ryunosuke Akutagawa's book."
node = mecab.parseToNode(text)
while True:
	node = node.next
	if not node: break
	print(node.surface,node.feature)

Execution example.

My noun,Pronoun,General,*,*,*,I,I,I
Particles,Attributive,*,*,*,*,of,No,No
Sister noun,General,*,*,*,*,sister,Ane,Ane
Is a particle,Particle,*,*,*,*,Is,C,Wow
Ryunosuke Akutagawa noun,Proper noun,Writer,*,*,*,Ryunosuke Akutagawa,Ryunosuke Akutagawa,Actagawa Ryunosuke
Particles,Attributive,*,*,*,*,of,No,No
This noun,General,*,*,*,*,Book,Hong,Hong
Particles,Case particles,General,*,*,*,To,Wo,Wo
Often adverbs,General,*,*,*,*,Often,Yoku,Yoku
Reading verb,Independence,*,*,Five steps, Ma line,Continuous connection,Read,Young,Young
Particles,Connection particle,*,*,*,*,so,De,De
Verb,Non-independent,*,*,One step,Uninflected word,Is,Il,Il
.. symbol,Kuten,*,*,*,*,。,。,。
 BOS/EOS,*,*,*,*,*,*,*,*

MeCab: Add new words to user-defined dictionary (Windows)

Add a new word to a user-defined dictionary

Prepare a dictionary

Compile user dictionary