ChaSen installation

A memo that installs ChaSen, a Japanese natural language processing system based on morphological analysis. Environment: centos6.3

[Here](http://getassoc.cs.nii.ac.jp/?%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%BC% E3% 83% AB% 2FChasen% E3% 81% AE% E3% 82% A4% E3% 83% B3% E3% 82% B9% E3% 83% 88% E3% 83% BC% E3% 83% AB) And here I tried to install it. Environment: centos6.3

First, go to see ChaSen information http://chasen-legacy.sourceforge.jp/ http://sourceforge.jp/projects/chasen-legacy/

It seems that iconv and Darts-0.31 are required.

First from Darts. There were 3 and 2, so put that in.

$ wget http://chasen.org/~taku/software/darts/src/darts-0.32.tar.gz
gtar xvzf darts-0.32.tar.gz
cd darts-0.32
./configure
make
make check
sudo make install

Completed with.

Since iconv is already included, I will omit it.

ChaSen installation

$ wget http://iij.dl.sourceforge.jp/chasen-legacy/56305/chasen-2.4.5.tar.gz
$ tar xzf chasen-2.4.5.tar.gz 
$ cd chasen-2.4.5
$ sudo ./configure
$ sudo make 
$ sudo make install

Install ipadic

$wget http://jaist.dl.sourceforge.jp/ipadic/24435/ipadic-2.7.0.tar.gz
# tar zxf ipadic-2.7.0.tar.gz
# cd ipadic-2.7.0
# ./configure

Convert dictionary file to UTF-8

`convert.sh`


#!/bin/sh
for file in *.dic *.cha
do
if [ -f $file ]; then
echo $file
iconv -f euc-jp -t utf-8 $file > tmpfile
mv tmpfile $file
fi
done
exit

Execute the above shell script to convert the dictionary file to UTF-8 and generate it.

$ sh ./convert.sh
$ `chasen-config --mkchadic`/makemat -i w
$ `chasen-config --mkchadic`/makeda -i w chadic *.dic
$ make install

chasenrc also converted to UTF-8

$ cd /usr/local/etc
$ iconv -f euc-jp -t utf-8 chasenrc > chasenrc.tmp
$ mv chasenrc.tmp chasenrc

You can now operate with UTF8.