Install mecab-ipadic-neologd on Sakura VPS (ubuntu18.04) with low memory and use it from python

Overview

I installed mecab-ipadic-neologd on Sakura VPS with ubuntu18.04LTS installed so that it can be called from python. mecab-ipadic-neologd couldn't be installed on the server due to lack of memory, so I scped a locally created dictionary.

environment

Sakura's VPS 2G plan The OS is as follows

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
((Omitted below)

$ python -V
Python 3.6.8

install mecab

Execute the following command on the remote server.

sudo apt install mecab
sudo apt install libmecab-dev
sudo apt install mecab-ipadic-utf8

Check the operation.

$echo god| mecab
God noun,General,*,*,*,*,God,Kami,Kami
Particles,Case particles,Collocation,*,*,*,What,Itte,Itte
Auxiliary verb,*,*,*,Literary language,Uninflected word,Ru,Ru,Ru
EOS

Reference: Install mecab on ubuntu 18.10

A copy of mecab-ipadic-neologd

This time, I have already installed mecab-ipadic-neologd on my local Mac PC, so I will scp the dictionary from there.

Check the location of the mecab-ipadic-neologd dictionary on your local mac.

$ sudo find / -name mecabrc
/usr/local/etc/mecabrc
/usr/local/Cellar/mecab/0.996/.bottle/etc/mecabrc

$ cat /etc/usr/local/mecabrc
;
; Configuration file of MeCab
;
; $Id: mecabrc.in,v 1.3 2006/05/29 15:36:08 taku-ku Exp $;
;
;dicdir =  /usr/local/lib/mecab/dic/ipadic
dicdir =  /usr/local/lib/mecab/dic/mecab-ipadic-neologd
(The following is omitted)

Scp the dictionary from local to remote.

$ scp -r /usr/local/lib/mecab/dic/mecab-ipadic-neologd [email protected]:~/

Check the operation of mecab on the remote server.

$echo god| mecab -d ~/mecab-ipadic-neologd
God noun,Proprietary noun,General,*,*,*,God,Camitel,Camitel
EOS

You can see that mecab-ipadic-neologd can be used because "God" that was not correctly recognized in the initial dictionary can be recognized with one word.

Reference: Use mecab-ipadic-NEologd with a cheap plan of Sakura VPS

Set default dictionary to mecab-ipadic-neologd

Set the default dictionary of mecab on the remote server to mecab-ipadic-neologd. It is almost the same as For Mac, just edit mecabrc.

Look for mecabrc on the remote server and check the contents.

$ sudo find / -name mecabrc
/etc/mecabrc

$ cat /etc/mecabrc
;
; Configuration file of MeCab
;
; $Id: mecabrc.in,v 1.3 2006/05/29 15:36:08 taku-ku Exp $;
;
dicdir = /var/lib/mecab/dic/debian
(The following is omitted)

The mecab dictionary seems to be stored in / var / lib / mecab / dic by default, so move mecab-ipadic-neologd there as well and change mecabrc to support it.

$ sudo mv ~/mecab-ipadic-neologd /var/lib/mecab/dic/
$ sudo nano /etc/mecabrc

(Change before)

;
; Configuration file of MeCab
;
; $Id: mecabrc.in,v 1.3 2006/05/29 15:36:08 taku-ku Exp $;
;
dicdir = /var/lib/mecab/dic/debian
(The following is omitted)

(After change)

;
; Configuration file of MeCab
;
; $Id: mecabrc.in,v 1.3 2006/05/29 15:36:08 taku-ku Exp $;
;
;dicdir = /var/lib/mecab/dic/debian
dicdir = /var/lib/mecab/dic/mecab-ipadic-neologd
(The following is omitted)

If you can change it correctly, you will be able to recognize "God" in one word without having to look it up in a dictionary.

$echo god| mecab -d ~/mecab-ipadic-neologd
God noun,Proprietary noun,General,*,*,*,God,Camitel,Camitel
EOS

mecab-python3 installation

Make mecab callable from python3.

sudo apt install swig
sudo apt install python3-pip
sudo pip3 install mecab-python3

Check if you can call it from python.

$ python
Python 3.6.8 (default, May 23 2019, 19:27:09) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import MeCab
>>> MeCab.Tagger().parse('God')

Failed initializing MeCab. Please see the README for possible solutions:

    https://github.com/SamuraiT/mecab-python3#common-issues

If you are still having trouble, please file an issue here, and include the
ERROR DETAILS below:

    https://github.com/SamuraiT/mecab-python3/issues

You don't have to write the issue in English.

------------------- ERROR DETAILS ------------------------
arguments: 
error message: [ifs] no such file or directory: /usr/local/etc/mecabrc
----------------------------------------------------------
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/shimaya/.pyenv/versions/3.6.8/lib/python3.6/site-packages/MeCab/__init__.py", line 124, in __init__
    super(Tagger, self).__init__(args)
RuntimeError

I get an error. It is said that / usr / local / etc / mecabrc does not exist. Since mecabrc should be in / etc / mecabrc, it seems that I have gone to read a place that does not exist. The mecabrc that python goes to read can be specified by explicitly preparing an environment variable called MECABRC.

$ nano ~/.bash_profile
(Add the following)
export MECABRC=/etc/mecabrc

$ source ~/.bash_profile

When I tried it again, I was able to morphologically analyze "God" in one word.

$ python
Python 3.6.8 (default, May 23 2019, 19:27:09) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import MeCab
>>> MeCab.Tagger().parse('God')
'God\t noun,Proprietary noun,General,*,*,*,God,Camitel,Camitel\nEOS\n'

Reference: Install mecab on ubuntu 18.10 Reference: Change the default dictionary of MeCab called from Python

Recommended Posts

Install mecab-ipadic-neologd on Sakura VPS (ubuntu18.04) with low memory and use it from python
Install the memcached plugin on MySQL and access it from Java
Install JDK and JRE on Ubuntu 16.10
Install Ubuntu Server 20.04 in VirtualBox on Mac and connect with SSH
When building rails6 environment on Ubuntu, it gets stuck with bundle install
Use cljstyle with Spacemacs on Ubuntu on WSL2
How to use Eclipse on my PC with 32bit and 2GB of memory
Build and install Wireshark Development Release (3.3.1) on Ubuntu
Install and switch between multiple Javas on Ubuntu
Ssh to Ubuntu on VirtualBox on your Mac and do it until you install Docker
Import the instance and use it on another screen
Install Eclipse on Mac and translate it into Japanese
How to use RealSense with ubuntu 20.04 and ROS Noetic