[Morphological analysis] How to add a new dictionary to Mecab

environment

Mac Mecab installed

procedure

1 Download the keyword file and create a CSV file

1-1 Keyword file download

#Hatena Keyword
curl -L http://d.hatena.ne.jp/images/keyword/keywordlist_furigana.csv | iconv -f euc-jp -t utf-8 > keywordlist_furigana.csv
# Wikipedia
curl -L http://dumps.wikimedia.org/jawiki/latest/jawiki-latest-all-titles-in-ns0.gz | gunzip > jawiki-latest-all-titles-in-ns0

1-2 Extract nouns into CSV files

`sample.rb`


require 'csv'

original_data = {
  wikipedia: 'jawiki-latest-all-titles-in-ns0',
  hatena: 'keywordlist_furigana.csv'
}

CSV.open("custom.csv", 'w') do |csv|
  original_data.each do |type, filename|
    next unless File.file? filename
    open(filename).each do |title|
      title.strip!

      next if title =~ %r(^[+-.$()?*/&%!"'_,]+)
      next if title =~ /^[-.0-9]+$/
      next if title =~ /Ambiguity avoidance/
      next if title =~ /_\(/
      next if title =~ /^PJ:/
      next if title =~ /Characters/
      next if title =~ /List/

      title_length = title.length

      if title_length > 3
        score = [-36000.0, -400 * (title_length ** 1.5)].max.to_i
        csv << [title, nil, nil, score, 'noun', 'General', '*', '*', '*', '*', title, '*', '*', type]
      end
    end
  end
end

After that, run sample.rb

ruby sample.rb

2 Create and add a user dictionary

Create a user dictionary custom.dic with the mecab-dict-index command based on the CSV file created in this way.

/usr/local/libexec/mecab/mecab-dict-index -d /usr/local/lib/mecab/dic/ipadic -u custom.dic -f utf-8 -t utf-8 custom.csv

Make sure you have custom.dic here.

After that, in the terminal, go to / usr / local / lib / mecab / dic / ipadic and

$ sudo vi dicrc

And

Finally, create a custom.dic directory.

userdic ="Location of the created dictionary directory"

Put in.

result

Let's implement the following code.

`sample01.py`


#coding:utf-8
import MeCab
tagger = MeCab.Tagger("-Ochasen")
result = tagger.parse("Cloud")
print result

At first, when you do not add a dictionary, "cloud" is

Kura Kura Kura Noun-Proper noun-General
Udo Udo noun-General

Whereas it was

Cloud cloud noun-General

became.

If you can do this, you're done. Thank you for your hard work.

Recommended Posts

[Morphological analysis] How to add a new dictionary to Mecab

Add a dictionary to MeCab

MeCab: Add new words to user-defined dictionary (Windows)

Add user dictionary to MeCab

Difference in morphological analysis results by mecab dictionary

How to use dictionary {}

How to quickly create a morphological analysis environment using Elasticsearch on macOS Sierra

How to convert a class object to a dictionary with SQLAlchemy

How to write a list / dictionary type of Python3

[NNabla] How to add a new layer between the middle layers of a pre-built network

[Python] Morphological analysis with MeCab

How to call a function

How to hack a terminal

How to build a new python virtual environment on Ubuntu

How to convert an array to a dictionary with Python [Application]

How to make a Japanese-English translation

How to put a symbolic link

To add a C module to MicroPython ...

[Python] How to add rows and columns to a table (pandas DataFrame)

How to make a slack bot

How to create a Conda package

How to make a crawler --Advanced

How to make a recursive function

How to add sudo when debugging

How to check the memory size of a dictionary in Python

■ [Google Colaboratory] Use morphological analysis (MeCab)

How to make a deadman's switch

How to create a Dockerfile (basic)

[Blender] How to make a Blender plugin

How to delete a Docker container

Metaclass (wip) to generate a dictionary

How to add AWS EBS volume

I played with Mecab (morphological analysis)!

How to make a crawler --Basic

How to create a config file

[Python] How to create a dictionary type list, add / change / delete elements, and extract with a for statement

How to generate a new loggroup in CloudWatch using python within Lambda

[Django 2.2] Add a New badge to new posts with a date using a template filter

[NNabla] How to add a quantization layer to the middle layer of a trained model

Add a new issue to GitHub by email (Amazon SES utilization version)

[Discord.py] How to add or remove job titles after a reaction [python]

Add a GPIO board to your computer. (1)

How to create a clone from Github

How to build a sphinx translation environment

How to create a git clone folder

Qiita (1) How to write a code name

How to draw a graph using Matplotlib

[Python] How to convert a 2D list to a 1D list

How to use mecab, neologd-ipadic on colab

[Colab] How to copy a huge dataset

How to install a package using a repository

[Ubuntu] How to execute a shell script

How to get a stacktrace in python

Various ways to create a dictionary (memories)

How to create a repository from media

Script to create a Mac dictionary file

How to make a Backtrader custom indicator

How to add python module to anaconda environment

How to choose a Seaborn color palette

How to test on a Django-authenticated page

How to make a Pelican site map