Install Cabocha 0.68 and perform dependency analysis in Python
Have Mecab installed https://code.google.com/p/mecab/downloads/list
Here, it is assumed that mecab-0.996.exe is installed in UTF-8.
Download cabocha-0.68.exe http://code.google.com/p/cabocha/downloads/list
Execute the downloaded EXE. At this time, the character code to be selected should be the same as the character code of Mecab.
Make it possible to execute cabocha through "C: \ Program Files (x86) \ CaboCha \ bin" in the path of the environment variable. It is also needed for python to access the dll.
Confirm execution Create a UTF8 file called input.txt, enter the character string you want to analyze, and execute the following from the command prompt.
cabocha < input.txt > out.txt
If it can be analyzed properly, the following file will be output.
here---D
Marisa's-D
It's a slow place!
EOS
Note that the reason why the file is routed here is that UTF-8 cannot be handled at the command prompt.
This is Marisa's slow place!
EOS
If this happens, the character code of input.txt may not be utf-8. (Note that the default is ANSI when created with Notepad)
In addition, the following error may occur.
svm.cpp(140) [version == MODEL_VERSION] incompatible version: 101
svm.cpp(751) [size >= 2] dep.cpp(79) [!failed] no such file or directory: C:\Program Files (x86)\CaboCha\etc\..\model\dep.ipa.model
In this case, the version of cabocha has not been upgraded properly, so delete the following folder.
C:\Users\User name\AppData\Local\VirtualStore\Program Files(x86)\CaboCha
Download cabocha-0.68.tar.bz http://code.google.com/p/cabocha/downloads/list This file can be decompressed with Lhaplus etc.
Move the current directory to the python folder in the unzipped folder and execute the following command.
python setup.py install
Traceback (most recent call last):
File "setup.py", line 13, in <module>
version = cmd1("cabocha-config --version"),
File "setup.py", line 7, in cmd1
return os.popen(str).readlines()[0][:-1]
IndexError: list index out of range
This happens because cabocha-config is not installed on Windows
Change before
#!/usr/bin/env python
from distutils.core import setup,Extension,os
import string
def cmd1(str):
return os.popen(str).readlines()[0][:-1]
def cmd2(str):
return string.split (cmd1(str))
setup(name = "cabocha-python",
version = cmd1("cabocha-config --version"),
py_modules=["CaboCha"],
ext_modules = [
Extension("_CaboCha",
["CaboCha_wrap.cxx",],
include_dirs=cmd2("cabocha-config --inc-dir"),
library_dirs=cmd2("cabocha-config --libs-only-L"),
libraries=cmd2("cabocha-config --libs-only-l"))
])
Rewrite version and the contents of ext_modules with the installed information.
After change
#!/usr/bin/env python
from distutils.core import setup,Extension,os
import string
def cmd1(str):
return os.popen(str).readlines()[0][:-1]
def cmd2(str):
return string.split (cmd1(str))
setup(name = "cabocha-python",
version = "0.68",
py_modules=["CaboCha"],
ext_modules = [
Extension("_CaboCha",
["CaboCha_wrap.cxx",],
include_dirs=[r"C:\Program Files (x86)\CaboCha\sdk"],
library_dirs=[r"C:\Program Files (x86)\CaboCha\sdk"],
libraries=['libcabocha'])
])
python setup.py install
#!/usr/bin/python
# -*- coding: utf-8 -*-
import CaboCha
# c = CaboCha.Parser("");
c = CaboCha.Parser("")
sentence = "Return the hat"
#print c.parseToString(sentence)
#tree = c.parse(sentence)
#
tree = c.parse(sentence)
print tree.toString(CaboCha.FORMAT_TREE)
print tree.toString(CaboCha.FORMAT_LATTICE)
#print tree.toString(CaboCha.FORMAT_XML)
for i in range(tree.chunk_size()):
chunk = tree.chunk(i)
print 'Chunk:', i
print ' Score:', chunk.score
print ' Link:', chunk.link
print ' Size:', chunk.token_size
print ' Pos:', chunk.token_pos
print ' Head:', chunk.head_pos #Head
print ' Func:', chunk.func_pos #Function words
print ' Features:',
for j in range(chunk.feature_list_size):
print ' ' + chunk.feature_list(j)
print
print 'Text'
for ix in range(chunk.token_pos,chunk.token_pos + chunk.token_size):
print ' ', tree.token(ix).surface
print
for i in range(tree.token_size()):
token = tree.token(i)
print 'Surface:', token.surface
print ' Normalized:', token.normalized_surface
print ' Feature:', token.feature
print ' NE:', token.ne #Named entity
print ' Info:', token.additional_info
print ' Chunk:', token.chunk
print
Hat-D
return
EOS
* 0 1D 0/1 0.000000
Hat noun,General,*,*,*,*,hat,Bow,Boshi
Particles,Case particles,General,*,*,*,To,Wo,Wo
* 1 -1D 0/0 0.000000
Verb to return,Independence,*,*,Godan / Sa line,Uninflected word,return,Kaes,Kaes
EOS
Chunk: 0
Score: 0.0
Link: 1
Size: 2
Pos: 0
Head: 0
Func: 1
Features: FCASE:To
FHS:hat
FHP0:noun
FHP1:General
FFS:To
FFP0:Particle
FFP1:Case particles
FFP2:General
FLS:hat
FLP0:noun
FLP1:General
FRS:To
FRP0:Particle
FRP1:Case particles
FRP2:General
LF:To
RL:hat
RH:hat
RF:To
FBOS:1
GCASE:To
A:To
Text
hat
To
Chunk: 1
Score: 0.0
Link: -1
Size: 1
Pos: 2
Head: 0
Func: 0
Features: FHS:return
FHP0:verb
FHP1:Independence
FHF:Uninflected word
FFS:return
FFP0:verb
FFP1:Independence
FFF:Uninflected word
FLS:return
FLP0:verb
FLP1:Independence
FLF:Uninflected word
FRS:return
FRP0:verb
FRP1:Independence
FRF:Uninflected word
LF:return
RL:return
RH:return
RF:return
FEOS:1
A:Uninflected word
Text
return
Surface:hat
Normalized:hat
Feature:noun,General,*,*,*,*,hat,Bow,Boshi
NE: None
Info: None
Chunk: <CaboCha.Chunk; proxy of <Swig Object of type 'CaboCha::Chunk *' at 0x0274A170> >
Surface:To
Normalized:To
Feature:Particle,Case particles,General,*,*,*,To,Wo,Wo
NE: None
Info: None
Chunk: None
Surface:return
Normalized:return
Feature:verb,Independence,*,*,Godan / Sa line,Uninflected word,return,Kaes,Kaes
NE: None
Info: None
Chunk: <CaboCha.Chunk; proxy of <Swig Object of type 'CaboCha::Chunk *' at 0x0274A170> >
Recommended Posts