A memo that installs ChaSen, a Japanese natural language processing system based on morphological analysis. Environment: centos6.3
[Here](http://getassoc.cs.nii.ac.jp/?%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%BC% E3% 83% AB% 2FChasen% E3% 81% AE% E3% 82% A4% E3% 83% B3% E3% 82% B9% E3% 83% 88% E3% 83% BC% E3% 83% AB) And here I tried to install it. Environment: centos6.3
First, go to see ChaSen information http://chasen-legacy.sourceforge.jp/ http://sourceforge.jp/projects/chasen-legacy/
It seems that iconv and Darts-0.31 are required.
First from Darts. There were 3 and 2, so put that in.
$ wget http://chasen.org/~taku/software/darts/src/darts-0.32.tar.gz
gtar xvzf darts-0.32.tar.gz
cd darts-0.32
./configure
make
make check
sudo make install
Completed with.
Since iconv is already included, I will omit it.
$ wget http://iij.dl.sourceforge.jp/chasen-legacy/56305/chasen-2.4.5.tar.gz
$ tar xzf chasen-2.4.5.tar.gz
$ cd chasen-2.4.5
$ sudo ./configure
$ sudo make
$ sudo make install
$wget http://jaist.dl.sourceforge.jp/ipadic/24435/ipadic-2.7.0.tar.gz
# tar zxf ipadic-2.7.0.tar.gz
# cd ipadic-2.7.0
# ./configure
Convert dictionary file to UTF-8
convert.sh
#!/bin/sh
for file in *.dic *.cha
do
if [ -f $file ]; then
echo $file
iconv -f euc-jp -t utf-8 $file > tmpfile
mv tmpfile $file
fi
done
exit
Execute the above shell script to convert the dictionary file to UTF-8 and generate it.
$ sh ./convert.sh
$ `chasen-config --mkchadic`/makemat -i w
$ `chasen-config --mkchadic`/makeda -i w chadic *.dic
$ make install
chasenrc also converted to UTF-8
$ cd /usr/local/etc
$ iconv -f euc-jp -t utf-8 chasenrc > chasenrc.tmp
$ mv chasenrc.tmp chasenrc
You can now operate with UTF8.
Recommended Posts