2016/2/19 Code correction / required file update 2016/2/19 Addition of troubleshooting
Why on Windows in the first place? I think there is a story, but there is one reason. This is because Windows has a powerful IDE, Visual Studio.
Code completion, snippets, debugging ... Although it is an IDE that has many merits, there are many difficulties in building an environment on Windows, and it seems that Python-related problems often cause build problems. There are various reasons, but Unix commands do not work, and the file path is written differently.
This time, we are going to overcome such a hurdle and install the Python library of MeCab, a purveyor to natural language processors.
__What is MeCab? __ [MeCab] [* 0] is a morphological analysis tool. A morpheme means something like the smallest block of meaningful words. For example, "separate writing with python" can be decomposed into "separate writing with python". Moreover, it is multifunctional, such as analyzing part of speech and remodeling it into a prototype. It's amazing. There are other morphological analysis tools such as [JUMAN] [* 1] and [KAKASI] [* 2] from Kyoto University.
Basically, refer to both articles [mecab-python -Python] [* 3] [Build MeCab for 64-bit Windows (using Visual Studio 2010) -iPentec] [* 4] on Windows 64bit. I will proceed. Thanks. .. .. ..
It seems easy to write like this!
Installation of MeCab itself. Just run the installer and write the environment variables.
C: \ Mecab \ bin
to the environment variable PATH.C: \ MeCab \ etc \ mecabrc
Environment variables change depending on the installation destination, so please check by yourself.
Modify and build the source file. Since it is for 32bit, fix it for 64bit. After that, I fixed something that seems to be a little mistake.
#
and add after !>
)Makefile.msvc.in 6th line (* Update amd86-> X64)
<! LDFLAGS = /nologo /OPT:REF /OPT:ICF /LTCG /NXCOMPAT /DYNAMICBASE /MACHINE:X86 ADVAPI32.LIB !>LDFLAGS = /nologo /OPT:REF /OPT:ICF /LTCG /NXCOMPAT /DYNAMICBASE /MACHINE:X64 ADVAPI32.LIB ```
Makefile.msvc.in 8th line
<! -DDLL_EXPORT -DHAVE_GETENV -DHAVE_WINDOWS_H -DDIC_VERSION=@DIC_VERSION@
!>-DDLL_EXPORT -DHAVE_GETENV -DHAVE_WINDOWS_H -DDIC_VERSION=102 \
```
Makefile.msvc.in 9th line
<! -DVERSION=""@VERSION@"" -DPACKAGE=""mecab""
!>-DVERSION=""0.996"" -DPACKAGE=""mecab""
```
Makefile.msvc.in line 11
<! -DMECAB_DEFAULT_RC=""c:\Program Files\mecab\etc\mecabrc"" !>-DMECAB_DEFAULT_RC=""d:\Programs\mecab\etc\mecabrc"" ```
feature_index.cpp Line 356
<! case 't': os_ << (size_t)path->rnode->char_type; break; !>case 't': os_ << (unsigned int)path->rnode->char_type; break; ```
writer.cpp Line 260
<! case 'L': *os << lattice->size(); break; !>case 'L': *os << (unsigned int)lattice->size(); break; ```
mecab.h line 1125
<! #ifndef SIWG !>#ifndef SWIG ```
Added to common.h include part
!>#include
Build. Execute in the mecab-0.996 \ src folder. It seems that the command prompt should be started with administrator privileges.
call "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\vcvarsall.bat" amd64 nmake -f Makefile.msvc.in ```
libmecab.dll
, mecab-cost-train.exe
, mecab-dict-gen.exe
, mecab-dict-index.exe
, mecab-system in MeCab \ bin of Mecab body Overwrite copy of -eval.exe
and mecab-test-gen.exe
.mecab.h
and libmecab.lib
in MeCab \ sdk of MeCab main body. In addition, copy the same file into the mecab-python-0.996 folder. (* Updated necessary steps)This is the end of the MeCab build! It's quite a problem because different people say different things.
__Supplement __ ~~ 5. is sufficient but may not be the minimum required. I will re-verify it at a later date ~~ (Confirmed! See above.)
Finally Python! There is a script that works only on Unix, so let's rewrite it for Windows.
Extract mecab-python-0.996.tar.gz
Rewrite setup.py as follows. Please change the installation destination as appropriate.
#!/usr/bin/env python from distutils.core import setup,Extension,os setup(name = "mecab-python", version = "0.996", py_modules=["MeCab"], ext_modules = [ Extension("_MeCab", ["MeCab_wrap.cxx",], include_dirs=[r"C:\MeCab\sdk"], library_dirs=[r"C:\MeCab\sdk"], libraries=["libmecab"]) ]) ```
Build. create a build folder
python setup.py build running build running build_py running build_ext building '_MeCab' extension C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\BIN\amd64\cl.exe /c /nolo go /Ox /MD /W3 /GS- /DNDEBUG -ID:\Programs\MeCab\sdk -IC:\Develop\python27\inclu de -IC:\Develop\python27\PC /TpMeCab_wrap.cxx /Fobuild\temp.win-amd64-2.7\Releas e\MeCab_wrap.obj MeCab_wrap.cxx MeCab_wrap.cxx(3747) : warning C4530: C++I'm using exception handling, but unwind Semantics are not enabled./Please specify EHsc. C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\BIN\amd64\link.exe /DLL /nologo /INCREMENTAL:NO /LIBPATH:D:\Programs\MeCab\sdk /LIBPATH:C:\Develop\python27\libs /LIBPATH:C:\Develop\python27\PCbuild\amd64 libmecab.lib /EXPORT:init_MeCab build\temp.win-amd64-2.7\Release\MeCab_wrap.obj /OUT:build\lib.win-amd64-2.7_MeCab.pyd /IMPLIB:build\temp.win-amd64-2.7\Release_MeCab.lib/MANIFESTFILE:build\temp.win-amd64-2.7\Release_MeCab.pyd.manifest MeCab_wrap.obj : warning LNK4197:export'init_MeCab'Is specified multiple times. Apply the very first specification. Library build\temp.win-amd64-2.7\Release_MeCab.lib and object build\t emp.win-amd64-2.7\Release_MeCab.Creating exp ```
Installation. Various files are copied to Lib \ site-packages where python is installed
python setup.py install running install running build running build_py running build_ext running install_lib copying build\lib.win-amd64-2.7\MeCab.py -> C:\Develop\python27\Lib\site-packages copying build\lib.win-amd64-2.7_MeCab.pyd -> C:\Develop\python27\Lib\site-packages byte-compiling C:\Develop\python27\Lib\site-packages\MeCab.py to MeCab.pyc running install_egg_info Writing C:\Develop\python27\Lib\site-packages\mecab_python-0.996-py2.7.egg-info ```
This completes the python setup! As was the case with the previous procedure, due to the difference in Unix terminal and command prompt specifications, it is necessary to directly specify the version and installation destination.
__1. In python setup.py build
, ʻerror: Unable to find vcvarsall.bat` __
I can't seem to read the Visual Studio file. Refer to [stack overflow] [* 5].
SET VS90COMNTOOLS=%VS140COMNTOOLS%`
If you do something like that, it should work. VS14 for Visual Studio 2015. For Visual Studio 2014, it's VS13, confusing but be careful.
__2. I don't have vcvarsall.bat
in the first place ... __
Let's fix and install Visual studio.
Launch the Visual Studio installer. If you have already installed it, there should be an item called Change
.
So, go to Programming Language-> Visual C ++
and check it. Then press Update.
This should be fine.
After all, you can finally use MeCab! Let's try it immediately
test.txt
Hello. I'm Big Hero 6. Protect your health.
First, prepare the above test file. And with python,
> python
Python 2.7.9 (default, Dec 10 2014, 12:28:03) [MSC v.1500 64 bit (AMD64)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> import MeCab
>>> import sys
>>> m = MeCab.Tagger("-Owakati")
>>> f = open('test.txt','r')
>>> text = f.read().decode('utf-8')
>>> f.close()
>>> f = open('test.txt','w')
>>> f.write(m.parse(text.encode('utf-8')))
>>> f.close()
like this!
Then, the test file I mentioned earlier
Hello . I'm Big Hero 6. Protect your health.
See you! It's written properly! However, because of MeCab's dictionary, Big Hero 6 is split. .. .. .. .. .. ..
__ (• ー •) <Do you really need a care robot that can correctly morphologically analyze? __
You should be able to do it! So next time, let's add a dictionary.
MeCab: Yet Another Part-of-Speech and Morphological Analyzer [KAKASI --Kanji → Kana (Romaji) conversion program] [* 1] [Japanese morphological analysis system JUMAN] [* 2] [On Windows 64bit, mecab-python -Python] [* 3] [Build MeCab for 64-bit Windows (using Visual Studio 2010) -iPentec] [* 4] What got stuck with the introduction of MeCab Python Add Star -Beginning of data mining and machine learning stack overflow -error: Unable to find vcvarsall.bat
<!-Reference list->