I used MeCab in python and I was caught in the dictionary specification, so I will leave it as a memorandum. The background is that it was necessary to switch the user dictionary between one process.
The official page of mecab is below. https://taku910.github.io/mecab/ Most of it is written here, but it is not written carefully, so it is necessary to investigate separately. The impression that mecab was not made on the premise of Windows, and the documents that came out after searching were few for Windows.
Paths are separated by /
instead of being separated by \
or \
** Do not insert a space when specifying the path to the dictionary **
Here, I will explain when specifying a dictionary from python code.
By the way, when executing from the command line, there is no problem even if there is a space.
However, since it is regarded as a space delimiter, cover the entire path with " "
.
#System dictionary specification
mecab -d "C:\Program Files (x86)\MeCab\dic\ipadic"
#User dictionary specification
mecab -u "C:\Program Files (x86)\MeCab\dic\ipadic\user.dic"
When specifying a dictionary, it is necessary to pass it as an argument when creating Tagger
import MeCab
tagger = MeCab.Tagger("-d [Path to system dictionary]")
tagger = MeCab.Tagger("-u [Path to user dictionary]")
For Windows, you probably have a dictionary in C: \ Program Files (x86) \ MeCab \ dic \ ipadic
. (Maybe it's not x86)
I write it in the above [Path to dictionary], but there are two points to note.
/
Both are based on the python spec, not the MeCab spec.
/
If you use \
or \
in the double quotation marks " "
of the string, it will be regarded as an escape character.
import MeCab
tagger = MeCab.Tagger("-d C:\Program Files (x86)\MeCab\dic\ipadic")
tagger = MeCab.Tagger("-u C:\Program Files (x86)\MeCab\dic\ipadic\user.dic")
not
tagger = MeCab.Tagger("-d C:/Program Files (x86)/MeCab/dic/ipadic")
tagger = MeCab.Tagger("-u C:/Program Files (x86)/MeCab/dic/ipadic/user.dic")
If you use an editor such as VS Code, this will be displayed as an error, so be aware of it immediately.
r"-d C:\Program Files (x86)\MeCab\dic\ipadic"
By adding r before the character string like, it works without any problem even if it remains \
or \
.
@ palm23 Thank you for telling me.By default, I think there is a dictionary in C: \ Program Files (x86) \ MeCab \ dic \ ipadic
, but if there is a space inProgram Files (x86)
, an error will occur.
If you want to specify a dictionary, you need to copy the dictionary to another location that does not contain spaces in the path and specify that dictionary. For example, create a folder called mecab directly under C, place a dictionary, and specify as follows.
import MeCab
tagger = MeCab.Tagger("-d C:/mecab/ipadic")
tagger = MeCab.Tagger("-u C:/mecab/ipadic/user.dic")
tagger = MeCab.Tagger("-d 'C:/Program Files (x86)/MeCab/dic/ipadic'")
Unfortunately this still results in an error.** * Addition ***
In the case of mecab-python3, even if there is a space, it works normally if it is enclosed in quotation marks.
pip install mecab-python3
@ palm23 Thank you for telling me.
Not limited to mecab, when specifying the path with python,
/
(or r" "
) without being separated by \
or \