Overview

Introduction

This is the 14th day article of "Fujitsu Cloud Technologies Advent Calendar 2019". Yesterday was @ 213's Story of the obstacle that occurred this year ..

Did Mr. 213 survive Friday the 13th safely? I'm curious.

Main subject

People suddenly want to read a book, but depending on their mood at that time, the content may be bright or dark, and the tendency of the book they want to read will change.

However, I don't have time to pick it up at a bookstore and check the synopsis and contents **, and it is open to the public for free on the Internet Aozora Bunko There is no synopsis in /), so you can't understand what kind of work it is unless you read it.

If so, let's make it ** understand the contents of the book at a glance ** without checking the contents.

What I made

In order for humankind to understand the contents at a glance, it is still necessary to ** illustrate the contents of the book **. Therefore, I analyzed the contents of the book and made it ** radar chart **. "No Longer Human" has a lot of ** dislike ** components, and "The Wind Rises" has a lot of ** joy ** components.

Implementation

Purpose

Given the URL of Aozora Bunko's work, a radar chart centered on emotions will be output.

Work environment

Google colaboratory

Actual work

To be able to illustrate in Japanese

By default, Japanese labels using matplotlib on google colabratory will be garbled. Therefore, I downloaded the font and deleted the cache so that I could plot Japanese. After the deletion, it will be reflected once the runtime is restarted.

!apt-get -y install fonts-ipafont-gothic
!rm /root/.cache/matplotlib/fontlist-v310.json
!rm /root/.cache/matplotlib/fontList.json
#Restart runtime

Download emotion dictionary

Next, I downloaded the emotion dictionary used to extract emotional expressions this time. This is published on ML-Ask, [The 3-Clause BSD License]( It is an open source dictionary according to https://opensource.org/licenses/BSD-3-Clause).

!wget http://arakilab.media.eng.hokudai.ac.jp/~ptaszynski/ccount/click.php?id=3 -O emotions.zip
!unzip emotions.zip

After downloading the zip file and unzipping it, you will see the following output. If you check here, you can see that multiple text files are saved under the ʻemotions` directory under the working directory. Each corresponds to ** "sorrow", "shame", "anger", "dislike", "fear", "surprise", "good", "昂", "cheap", "joy" **. In the file, the words and phrases corresponding to the emotion are saved.

Archive:  emotions.zip
   creating: emotions/
  inflating: emotions/aware_uncoded.txt  
  inflating: emotions/haji_uncoded.txt  
  inflating: emotions/ikari_uncoded.txt  
  inflating: emotions/iya_uncoded.txt  
  inflating: emotions/kowa_uncoded.txt  
  inflating: emotions/odoroki_uncoded.txt  
  inflating: emotions/suki_uncoded.txt  
  inflating: emotions/takaburi_uncoded.txt  
  inflating: emotions/yasu_uncoded.txt  
  inflating: emotions/yorokobi_uncoded.txt

When I checked the contents of ʻaware_uncoded.txt`, I found that each line contained a word classified as "sorrow".

!head emotions/aware_uncoded.txt
My chest tears
Suck up
Scratched in tears
Cloudy face
Come with me
Crying
Crying
Crying
Hiccups crying
Raise up

Reading emotion dictionary data

Now that we have confirmed that the dictionary has been downloaded successfully, load the data so that it can be handled on python. To that end, we have defined the following functions. Converts the words and phrases separated for each emotion into one ** dict format ** and returns it.

def get_emotional_words():
  emotions = ["aware", "haji", "ikari", "iya", "kowa", "odoroki", "suki", "takaburi", "yasu", "yorokobi"]
  emotional_words = {}
  for emotion in emotions:
    emotional_words[emotion] = []
    with open("emotions/" + emotion + "_uncoded.txt", "r") as f:
        for line in f:
          line = line.replace('\n','')
          emotional_words[emotion].append(line)
  return emotional_words

Acquisition of work data of Aozora Bunko

Now, next, I will pull the work data that I want to check the contents. The data of Aozora Bunko is also published on git, but this time I got the data from html. Taking advantage of my experience in My article about 2 years ago, I am trying to get some pre-processing to get the data.

def get_txt_from_aozorabunko(url):
  html = urllib.request.urlopen(url=url)
  soup = BeautifulSoup(html, "html.parser")
  #Get the text →<div class="main_text">~ Body ~</div>
  sentences = soup.find("div","main_text")
  #Extract only the character part
  sentences = sentences.get_text().replace("\r", "").replace("\n", "").replace("\u3000", "")
  #Remove the characters and parentheses enclosed in double-byte parentheses (because ruby exists as parentheses)
  sentences = re.sub("（.*?）", "", sentences) 
  return sentences

Sentiment analysis

Now that we can read the dictionary and get the text data of the work, let's create a sentiment analysis function. This time, we adopted the ** dictionary-based ** method and performed sentiment analysis using a relatively simple method. Specifically, it counts the number of words contained in the emotion dictionary in the text data of the work. Here is the actual code.

def count_emotional_words(sentences, emotional_words):
  count_emotions = [0] * len(emotional_words.keys())
  for idx, emotion in enumerate(emotional_words.keys()):
    for word in emotional_words[emotion]:
      count_emotions[idx] += sentences.count(word)
  return count_emotions

Make a radar chart

Create a code that makes a radar chart of the frequency of occurrence of the last acquired emotional expression word. This article was very helpful in creating the code. Thank you very much. Enter the list of emotion names in labels and the frequency of emotion expression words received by the count_emotional_words function in values.

def plot_polar(labels, values, title):
  jp_font = {'fontname':'IPAGothic'}
  angles = np.linspace(0, 2 * np.pi, len(labels) + 1, endpoint=True)
  values = np.concatenate((values, [values[0]]))  #Make it a closed polygon
  fig = plt.figure()
  ax = fig.add_subplot(111, polar=True)
  ax.plot(angles, values, 'o-')  #Outer frame
  ax.fill(angles, values, alpha=0.25)  #fill
  ax.set_thetagrids(angles[:-1] * 180 / np.pi, labels, fontsize=15, **jp_font)  #Axis label
  ax.set_rlim(0 ,max(values))
  ax.set_title("「" + title + "」", fontsize=15, **jp_font)

Run

Finally, execute it with the main function. If the URL of Aozora Bunko and the title of the work are given to this main function, a radar chart like the one at the beginning will be output.

def main(url, title):
  sentences = get_txt_from_aozorabunko(url)
  emotional_words = get_emotional_words()
  count_emotions = count_emotional_words(sentences, emotional_words)
  emotional_kanji = ["Sorrow", "shame", "Angry", "Dislike", "Scary", "Surprise", "Good", "昂", "Ahn", "Joy"]
  labels = list(emotional_kanji)
  values = count_emotions
  plot_polar(labels, values, title)

Execution example

Actually, I entered "No Longer Human" by Osamu Dazai.

#Aozora Bunko work URL
ningen_shikkaku = "https://www.aozora.gr.jp/cards/000035/files/301_14912.html"
main(ningen_shikkaku,"Human disqualification")

** I was able to create a radar chart of the emotional trends of the work safely! ** **

result

Trends for each work

Let's check the output example at the beginning again. Focusing on "Ningen Shikkaku", I get the impression that ** "The Wind Rises" is a fairly bright work **. On the other hand, the overall atmosphere of "Kokoro" is similar to that of "No Longer Human", but it may be that ** has a large positive component, so the love component is stronger ** (both have stories of color love, but " I feel that "heart" was a more straightforward emotional expression). On the other hand, it can be inferred from the ** eerie world view of Edogawa Ranpo's "Human Chair" that the scary component tends to be greater than other works.

At the end

Impressions

** Fun to see radar charts of various works ** (Run, Melos has a surprisingly large amount of sadness)

** I was able to analyze with a simple dictionary-based method **

** There are few Japanese emotion dictionaries ** (The one I used this time and SNOW)

I also wanted to do this

** Web toolization ** (It was cool to be able to run it from a browser)

** Reorganization of dictionary ** (Since there are a few 10 types of emotional expressions, make a collection of about 5 types)

** Let's visualize the difference between the beginning and the end of the story ** (The story seems to develop!)

The next advent calendar is

Tomorrow, Mr. yoshitsugumiyazaki said, "There is a high possibility that we will summarize AI-related concerns." It's still AI material. I'm looking forward to what kind of content will be put together!

I made an emotion radar chart of Aozora Bunko's work