List of tools that can be used to easily try sentiment analysis of Japanese sentences in Python (try with google colab)

I searched for and tried a tool that can be installed with pip and can be used when you want to easily try sentiment analysis.
This is the story I found on 2020-08-08.
I will not go into detail about the method of sentiment analysis of Japanese sentences in this article, but I think the following articles are organized in an easy-to-understand manner.
[Natural language processing] How to proceed with sentiment analysis & points that are easy to get hooked on --Qiita
Sentiment analysis of corporate word-of-mouth data of career change meetings using deep learning --Qiita

A list of tools you can use to easily try sentiment analysis

It is listed below in a bulleted list.

asari -Story of making and packaging Japanese Sentiment Analyzer --Ahogrammer --Only using sklearn's TfidfVectorizer and LinearSVC --Performance comparable to BERT predictions -Unknown training dataset --MIT license -Demo site
oseti -Sentiment Analysis library oseti for Python using Japanese evaluation polarity dictionary has been released --Qiita --Dictionary base --Simple implementation --MIT license
pymlask -Sentiment analysis of text with ML-Ask --Qiita
- a Python version of ML-Ask (eMotive eLement and Expression Analysis system) ――Estimate 10 kinds of emotions {joy, anger, sadness, fear, shame, good, 厭, 昂, cheap, surprise} by pattern matching with a dictionary of 2,100 words.
- The BSD 3-Clause License

Installation of required libraries

The following is the installation method on google colab.

If you want to try it locally, change the installation method accordingly.

#Install MeCab
!apt install mecab libmecab-dev mecab-ipadic-utf8
!pip install mecab-python3

# mecab-ipadic-Install NEologd
!apt install git make curl xz-utils file
!git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git
!echo yes | mecab-ipadic-neologd/bin/install-mecab-ipadic-neologd -n -a

# Ref: https://qiita.com/Fulltea/items/90f6ebe6dcceaf64eaef
# Ref: https://qiita.com/SUZUKI_Masaya/items/685000d569452585210c

!ln -s /etc/mecabrc /usr/local/etc/mecabrc
# Ref: https://qiita.com/Naritoshi/items/8f55d7d5cce9ce414395

#Library for sentiment analysis
!pip install asari oseti pymlask

Data preparation

The text used as input for sentiment analysis is from Aozora Bunko.

"Puppet use" Hans Christian Andersen (Translated by Genkuro Yazaki)

I picked it up from.

list_text = [
             'This person must be the happiest person in the world.',
             'The playhouse was wonderful and the audience was wonderful.',
             'If it was in the Middle Ages, it would probably have been burned at the stake.',
             'When it came to everyone's annoyance, it was as if flies were buzzing in the bottle.',
             'If we humans can come up with these things, we should be able to live longer before they are buried in the earth.'
]

asari

Source code: asari --Explanatory article: The story of making and packaging Japanese Sentiment Analyzer --Ahogrammer --Method: --Use only scikit-learn --The text is converted to a vector representation with tf-idf (term frequency–inverse document frequency), and it is judged as a classification problem whether the text is positive or negative using the support vector machine of the linear kernel. ――It is said that the performance is comparable to the prediction by BERT, which is a deep learning model. ――Since it is unknown which training data set was used for learning, it is unclear what kind of sentences can be used for proper judgment. (issue) --License: MIT -Demo site

#Simple operation check
from asari.api import Sonar
sonar = Sonar()
res = sonar.ping(text="Too many ads ♡")
res

{'classes': [{'class_name': 'negative', 'confidence': 0.9086981552962491},
  {'class_name': 'positive', 'confidence': 0.0913018447037509}],

'text':'Too many ads ♡', 'top_class': 'negative'}

list(map(sonar.ping, list_text))

[{'classes': [{'class_name': 'negative', 'confidence': 0.10382535749585702},
   {'class_name': 'positive', 'confidence': 0.896174642504143}],

'text':' This person must be the happiest person in the world. ', 'top_class': 'positive'}, {'classes': [{'class_name': 'negative', 'confidence': 0.035517582235360945}, {'class_name': 'positive', 'confidence': 0.964482417764639}], 'text':' The playhouse was wonderful and the audience was wonderful. ', 'top_class': 'positive'}, {'classes': [{'class_name': 'negative', 'confidence': 0.5815274190768989}, {'class_name': 'positive', 'confidence': 0.41847258092310113}], 'text':' If it was the Middle Ages, it would probably have been burned at the stake. ', 'top_class': 'negative'}, {'classes': [{'class_name': 'negative', 'confidence': 0.2692695045573754}, {'class_name': 'positive', 'confidence': 0.7307304954426246}], 'text':' When it comes to everyone's annoyance, it was as if flies were buzzing in the bottle. ', 'top_class': 'positive'}, {'classes': [{'class_name': 'negative', 'confidence': 0.050528495655525495}, {'class_name': 'positive', 'confidence': 0.9494715043444746}], 'text':'If we humans can come up with these things, we should be able to live longer before they are buried in the earth', 'top_class': 'positive'}]

The sentence, "When it comes to everyone's annoyance, it was as if a fly was buzzing in a bottle." Intuitively, it was a negative impression, but it was judged to be positive.

There seems to be a reasonable judgment for other examples.

oseti

Source code: oseti --Explanatory article: Sentiment Analysis library oseti for Python using Japanese evaluation polarity dictionary has been released --Qiita --Method: -Japanese Evaluation Polarity Dictionary Base --If the negative expression is at the end of the sentence, the negative and positive are reversed (code) --It's interesting because the end of the sentence "It's not there" is taken into consideration (code) --License: MIT

#Simple operation check
import oseti

analyzer = oseti.Analyzer()
analyzer.analyze('I'm waiting in heaven.')

[1.0]

list(map(analyzer.analyze, list_text))

[[0.0], [1.0], [0], [0], [1.0]]

The second sentence, "The playhouse was wonderful and the customers were wonderful."

When

Fifth sentence "If we humans can come up with this, we should be able to live longer before we are buried in the earth."

Only positive (+1) judgment, neutral judgment for other sentences.

After all, the impression that dictionary-based is weak against words that are not included in the dictionary.

pymlask

The author of the package is the same as oseti.

Source code: pymlask --Explanatory article: Text emotion analysis with ML-Ask --Qiita --Method: -ML-Ask (eMotive eLement and Expression Analysis system) A package that allows you to use the library with python. ――Estimate 10 kinds of emotions {joy, anger, sadness, fear, shame, good, 厭, 昂, cheap, surprise} by pattern matching with a dictionary of 2,100 words. --License: The BSD 3-Clause License

#Simple operation check
import mlask
emotion_analyzer = mlask.MLAsk()
emotion_analyzer.analyze('I don't hate him!(;´Д`)')
# => {'text': 'I don't hate him!(;´Д`)',
#     'emotion': defaultdict(<class 'list'>,{'yorokobi': ['Hate*CVS'], 'suki': ['Hate*CVS']}),
#     'orientation': 'POSITIVE',
#     'activation': 'NEUTRAL',
#     'emoticon': ['(;´Д`)'],
#     'intension': 2,
#     'intensifier': {'exclamation': ['！'], 'emotikony': ['´Д`', 'Д`', '´Д', '(;´Д`)']},
#     'representative': ('yorokobi', ['Hate*CVS'])
#     }

{'activation': 'NEUTRAL',
 'emoticon': ['(;´Д`)'],

'emotion': defaultdict (list, {'suki': ['dislike * CVS'],'yorokobi': ['dislike * CVS']}), 'intensifier': {'emotikony': ['´Д', 'Д', '´Д', '(;´Д)'], 'exclamation': ['！']}, 'intension': 2, 'orientation': 'POSITIVE', 'representative': ('yorokobi', ['dislike * CVS']), 'text':'I don't hate him! (; ´Д)'}

#It's a big deal, so I'll try using the neologd dictionary

# mecab-ipadic-Find out where to install neologd
import subprocess

cmd='echo `mecab-config --dicdir`"/mecab-ipadic-neologd"'
path = (subprocess.Popen(cmd, stdout=subprocess.PIPE,
                           shell=True).communicate()[0]).decode('utf-8')
                           
emotion_analyzer = mlask.MLAsk('-d {0}'.format(path))  # Use other dictionary

list(map(emotion_analyzer.analyze, list_text))

[{'activation': 'NEUTRAL',
  'emoticon': None,

'emotion': defaultdict (list, {'yorokobi': ['happiness']}), 'intensifier': {}, 'intension': 0, 'orientation': 'POSITIVE', 'representative': ('yorokobi', ['happy']), 'text':' This person must be the happiest person in the world. '}, {'emotion': None,'text':' The playhouse was wonderful and the audience was wonderful. '}, {'emotion': None,'text':' If it was the Middle Ages, it would probably have been burned at the stake. '}, {'emotion': None,'text':'When it comes to everyone's annoyance, it was like a fly buzzing in a bottle. '}, {'emotion': None, 'text':'If we humans can come up with this, we should be able to live longer before we are buried in the earth'}]]

This method is also judged to be positive if there is a word (happiness) in the dictionary, but it is impossible to judge if it is not in the dictionary.

Impression that the result is not good.

Summary

I tried a tool that makes it easy to analyze the emotions of Japanese sentences.

Thank you for publishing these tools.

If you want to do serious sentiment analysis and get more reasonable results, you will probably need to add processing according to the sentence category that suits your purpose, or use neural network techniques (in that case, the data set). It's hard to create).

reference

Commentary, summary article

-[[27 posted] Dataset summary that can be used for sentiment analysis of sentences, facial expressions, and voice | Lionbridge AI](https://lionbridge.ai/ja/datasets/15-free-sentiment-analysis-datasets-for-machine -learning /) --Links to resources and polarity dictionaries, etc. -[Natural language processing] How to proceed with sentiment analysis & points that are easy to get hooked on --Qiita -Story of making and packaging Japanese Sentiment Analyzer --Ahogrammer -Sentiment analysis of corporate word-of-mouth data of job change meetings using deep learning --Qiita -I tried to analyze the emotions of the whole novel "Weathering with You" ☔️ --Qiita -Sentiment Analysis library oseti for Python using Japanese evaluation polarity dictionary has been released --Qiita -Sentiment analysis of text with ML-Ask --Qiita

Sentiment analysis Japanese dataset

-SNOW D18: Japanese Emotional Expression Dictionary-Nagaoka University of Technology Natural Language Processing Laboratory --Nagaoka University of Technology Natural Language Processing Laboratory ――Approximately 2,000 expressions are recorded, and each expression is given 48 categories of emotions that we have defined independently.