In this article, I will explain how to process the text of the novel ** "Weathering with You" ** </ font> in natural language and perform sentiment analysis!
In general, ** sentiment analysis ** refers to discovering and quantifying ** "emotions" ** contained in a sentence and judging the opinion of the sentence. This is an area that is currently attracting attention because it allows users to mechanically classify their opinions about their products and services.
On the other hand "Isn't it possible to use sentiment analysis in addition to reviews and word-of-mouth?" So, in this article, I decided to challenge the sentiment analysis based on ** "Novel" ** </ font>, which is rarely done in the streets.
The purpose of this article is to analyze the emotions of the novel, ** "Isn't it possible to infer the rough development of the story and the character of the characters? **".
For example, in a story ・ If the ups and downs of emotional values are intense, it is a very dramatic development. ・ If you find a turning point from positive </ font> to negative </ font> based on your emotional value, you can objectively discover the origin and transition of the story. it can And so on.
And the subject I chose this time is Weathering with You </ font>! It is a big hit following the previous work ** "Your Name." **, and many people may have seen the movie.
[Novel Weather Child (Kakugawa Bunko)-amazon](https://www.amazon.co.jp/%E5%B0%8F%E8%AA%AC-%E5%A4%A9%E6%B0%97 % E3% 81% AE% E5% AD% 90-% E8% A7% 92% E5% B7% 9D% E6% 96% 87% E5% BA% AB-% E6% 96% B0% E6% B5% B7 -Quoted from% E8% AA% A0 / dp / 4041026407)
I think it's still new to my memory ~~ (I have forgotten it already) ~~, so I think that those who went to the theater can enjoy watching it while remembering the movie scene.
――In the novel, the story progresses while the viewpoints (first person) of the characters change. ――The main viewpoint is the viewpoint of the main character, Hodaka, but there is also a chapter on Hina and Natsumi. ――The story is basically the same. The novel has more detailed explanations.
It's a personal impression, but you can enjoy the story from a slightly different perspective than the movie.
This time, we performed the simplest sentiment analysis ** "Positive Negative Analysis" **. ** "Positive / Negative Analysis" ** means whether the text is a positive </ font> opinion, a negative </ font> opinion, or It is a classification method that judges whether it is neutral or not from a series of words. First, I will explain the general flow of sentiment analysis with a simple example. First, the sentence is morphologically analyzed and decomposed into morphemes (words) as follows.
** "Hatto Natsumi raises her hand energetically, and Suga ignores it. 』**
↓
** ['Ha I',' and',' Natsumi',' San',' ga',' Genki',' to',' hand',' to','raise',',',' Suga' ,'San',' is','it',' to','ignore','do'] **
Then, each word is judged as positive </ font> or negative </ font>, and an emotion value is given to each.
** ['Hai', 0], ['and', 0], ['Natsumi', 0], ['san', 0], ['ga', 0], ['Genki', 1], ['To', 0], ['Hand', 0], ['Raise', 0], ['Raise', 0], [',', 0], ['Suga', 0], [' San', 0], ['is', 0], ['it', 0], ['is', 0], ['ignore', -1], ['do', 0] **
In this sentence, to energetic and neglect with emotional polarity, Genki: +1 Ignore: -1 Emotional value was given.
Finally, the total value is calculated to calculate the sentence emotion value. In the case of the above sentence, 1 + (-1) means that the emotion value is 0. In this way, the emotional value is given for each sentence.
An emotion dictionary is used to determine the positive / negative degree of a word. An emotion dictionary is a dictionary in which which words are positive </ font> or negative </ font> are written in advance as shown below. is. In this dictionary, if a word containing a negative (evaluation) comes, the emotion value is given -1, and if a positive (evaluation) comes, it is given a +1.
Emotional values are calculated based on this dictionary.
In addition, as shown below, the specifications allow positive / negative judgment using multiple words. [Mouth corner + rise, +1] [Voice + pop, +1] [Energetic + not, -1] [Danger + drowsy, -1]
First, create the original corpus. I made it myself like this with the help of a friend. ↓ ** * Due to copyright reasons, the corpus cannot be published, so only a part of it will be introduced **
A word from Mr. S, a friend and creator of the corpus This time, I undertook the transcription with the intuition that "I don't know, but it's definitely an interesting one." It takes a whole week to copy and divide several sentences using Kindle's "Notes and Highlights". To be honest, it was very difficult. You can copy and paste pages and weather while dragging sentences, but if you do this while you are exhausted from copying and pasting work, it feels like a rally ho. At the time of delivery, I was told "Thank you for the corpus!", But I confess that I was secretly googled because I didn't understand the "corpus".
As mentioned above, prepare for sentiment analysis. This time, Tohoku University's Inui-Okazaki Laboratory is open to the public ["Japanese Evaluation Polar Dictionary"](http://www.cl.ecei.tohoku.ac.jp/index.php?Open%20Resources% 2FJapanese% 20Sentiment%20Polarity%20Dictionary) will be partially reorganized and used to suit the contents of the weather child. I made my own ** "multi-word dictionary" ** for positive / negative judgment by multiple words such as [Mouth angle + up, +1] introduced earlier.
Import the emotion dictionary and output the emotion value for each sentence. The code is given at the end of this article, but if you want a quick sentiment analysis in Japanese, the library ** "oseti" ** is very useful. I also used it as a reference.
I created a script that returns an emotion value as a return value when text is input, and added the emotion value to the data frame of the corpus as shown below.
total_word_score_pair_list_abs1: A list of morphemes with emotional polarity values and their emotional values sum_positive_srore: sum of positive values sum_pegative_srore: Sum of negative values new_srore_sum: The sum of positive and negative values
Let's graph the emotion value using the visualization library seaborn. Sum the emotional values for each page and look at the ** transition in chronological order **. The x-axis is the number of pages and the y-axis is the emotion value. ** * Maybe spoilers are included from here. be careful. </ font> **
Changes in emotional values on a page-by-page basis
import matplotlib.pyplot as plt
from statistics import mean, median
from matplotlib import pyplot as plt
import seaborn as sns; sns.set()
import re
%matplotlib inline
page_sum_df = df_tenki2.groupby("page_num").new_score_sum.sum().reset_index()
sns.lineplot(x="page_num", y="new_score_sum", data=page_sum_df)
Here are the results!
The emotional ups and downs are quite intense! Positive </ font> and Negative </ font> alternate. What can be read from this graph --Positive and negative appear clearly as undulations on each page --There are pages with extremely high positive and negative values ――The emotional value is changing (maybe) depending on the story and development. However, the undulations are very fine in the graph, and the features are a little difficult to understand ...
In order to get a slightly rougher feature, let's replace the x-axis ** page units with chapter units ** and add them up.
Changes in emotional values by chapter
chapter_sum_df = df_tenki2.groupby("chapter_flag").new_score_sum.sum().reset_index()
sns.lineplot(x="chapter_flag", y="new_score_sum", data=chapter_sum_df)
--Chapter-based features can be grasped more globally than page-based -Emotional values are large in undulations for each chapter, and in the chapter immediately after the emotional value drops, the emotional value tends to swing positively. --There are no chapters with negative emotional values on a chapter-by-chapter basis.
The ups and downs of emotions are easier to understand and interpret than before!
But ... Take a look at the y-axis values in the graph. There are few negative </ font> values. Did people who saw movies and novels feel uncomfortable?
** "Weathering with You was such a peaceful work ...?" **
So let's go one step further and analyze. Next, divide the emotion value into positive value </ font> and negative value </ font> ** instead of the total value ** Draw a graph.
chapter_sum_df = df_tenki2.groupby("chapter_flag").sum_positive_scores.sum().reset_index()
sns.lineplot(x="chapter_flag", y="sum_positive_scores", data=chapter_sum_df,color="red")
chapter_sum_df2 = df_tenki2.groupby("chapter_flag").sum_negative_scores.sum().reset_index()
sns.lineplot(x="chapter_flag", y="sum_negative_scores", data=chapter_sum_df2,color="blue")
The red graph is the transition of positive value </ font>, and the blue is the transition of negative value </ font>.
Both positive and negative features are now clearly visible! As before, when the positive value and the negative value are totaled and expressed in one graph, ** "When both the positive and negative values showed large values, the values were offset and the features were hard to see." ** It seems.
Now, let's take a brief look at the content of the novel.
Looking at the overall feeling, the fluctuation range of emotional values is large at the beginning and the second half of the story. In chapter 2 and chapter 8 and later, both positive value </ font> and negative value </ font> are fairly high. Is it the ** "ki" ** part and the ** "transition" ** part in Kishōtenketsu?
Next, let's look at the magnitude of the value.
From this graph, it can be seen that the positive value </ font> of chapter 8 and chapter 10 is the largest and chapter 2 is the largest negative </ font>.
chapter8 chapter8 is a relatively peaceful scene. To play in the park together, to consult with Hina's younger brother ** "Nagi" ** or to buy a ring in order for ** "Hawaka" ** to confess to ** "Hina" ** I'm about to go. No wonder why the positive value </ font> is high.
chapter10 chapter10 is the climax of the story. This is a scene where ** "Sail Height" ** struggles to help ** "Hina" **. Not only positive value </ font> but also negative value </ font> is high, so the feeling of ** "sail height" ** is intense. I can imagine that it is.
chapter2 chapter2 is the early scene of the story. "Hodaka" who has run away from home comes to Tokyo and tries to find a part-time job, but is crushed by the waves of the city, and finally visits the office of ** "Suga" ** and works. Did the negative value </ font> appear where you were being rubbed by the waves of the city or being scolded by ** "Suga" **?
From here, let's compare the emotional values for each character. Let's calculate the average emotional value per sentence of each character's dialogue.
** Total emotion value of each character's dialogue / Number of dialogue of each character **
This time, we will compare the four main characters ** "Hodaka" **, ** "Hina" **, ** "Suga" **, and ** "Natsumi" **.
df_tenki3=df_tenki2.groupby(['speaker_name'])['new_score_sum'].mean().reset_index()
df_tenki4 = df_tenki3.sort_values('new_score_sum', ascending=False)
df_tenki_person = df_tenki4[(df_tenki4["speaker_name"] == "suga") | (df_tenki4["speaker_name"] == "hodaka") | (df_tenki4["speaker_name"] == "natsumi") | (df_tenki4["speaker_name"] == "hina")]
sns.catplot(x="speaker_name", y="new_score_sum", data=df_tenki_person,height=6,kind="bar",palette="muted")
The positive and negative are clearly separated between men and women.
The women are really positive </ font>! On the contrary, the two men are quite negative </ font>.
Both of them show very close values. In fact, even in the novel, the main characters Hotaka and Suga
** "These two are very similar" **
There is a depiction that is said by the surroundings, but you can see that it is similar in terms of emotional value.
Let's also look at positive values </ font> and negative values </ font>.
df_tenki3=df_tenki2.groupby(['speaker_name'])['sum_positive_scores','sum_negative_scores'].mean().reset_index()
df_tenki4 = df_tenki3.sort_values('sum_positive_scores', ascending=False).reset_index()
df_tenki_person2 = df_tenki4[(df_tenki4["speaker_name"] == "suga") | (df_tenki4["speaker_name"] == "hodaka") | (df_tenki4["speaker_name"] == "natsumi") | (df_tenki4["speaker_name"] == "hina")]
sns.catplot(x="speaker_name", y="sum_positive_scores", data=df_tenki_person2,kind="bar",palette="muted")
sns.catplot(x="speaker_name", y="sum_negative_scores", data=df_tenki_person2,kind="bar",palette="muted")
** "Hina" ** seems to speak less negative </ font> words than the three. What a strong girl ... ** "Natsumi" ** has the top positive value </ font>, but the negative value </ font> is also reasonably high. ** "Natsumi" ** is usually quite bright, but there are various negative </ font> words such as self-hatred due to job hunting and complaining to Suga. I have the impression that you are talking. And the positive value </ font> of ** "sail height" ** is quite low. Certainly there is not much bright impression ... But if the positive </ font> of ** "Hina" ** pulls you, these two may be a good combination (?)
Finally, let's examine the relationship between "weather" </ font> and emotional values, which are the keys to this story. The method is to calculate the average value for each weather scene as in the case of the characters. The types of weather were classified into the following 6 categories, judging from the description when creating the corpus. "Sunny" "rain" "light rain" "heavy_rain" "clear" "snow" Since there are three types of rain intensity, you can see the relationship between rain intensity and emotional value. By the way, due to the setting of the story, the stage of the story (Tokyo) is basically "rain" except when the heroine ** "Hina" ** wishes for a sunny day and when ** something happens **. It is the state of. Because it is raining every day, everyone in the city wants it to clear up.
** As a hypothesis, a sunny scene will be positive </ font>, and the heavier the rain, the negative </ font>. ?? ** **
I expected it. Let's see the result!
sns.catplot(x="weather_flag", y="new_score_sum", data=df_tenki2_edited,height=6,kind="bar",palette="muted")
After all sunny is the most positive </ font>! It's predictable! It doesn't seem to have much to do with the intensity of the rain. However, although "clear" is very positive, "clear" has not changed much to positive </ font>. Why...? As mentioned above, the weather clears only when Hina wishes for a sunny day and when something happens **, but that ** thing ** is ..... so,
"Hina has disappeared" </ font>
Due to the disappearance of Hina, the city that had been raining until now becomes clear at once. However, the general public, who do not know that ** "Hina" ** has been sacrificed, is obediently pleased with the sunny weather. However, only the main character ** "Sail Height" ** knows that and is very sad. It should be the best weather in the story, but here the feelings of ** "Sail height" ** go in the opposite direction to the world. So even though it's sunny, it doesn't become very positive.
Lastly, regarding snow, this is the most negative </ font>. If you've seen a movie here, you'll be happy with it. This is the climax, where ** "Sail Height" ** and ** "Hina" ** escape from the police and escape to a love hotel. It is a scene where there are many negative </ font> expressions because the world and the sail heights are confused by the extremely abnormal weather that it snows even though it is August.
What I learned from the novel by sentiment analysis ――You can read the rough development of the story and the turning point of the change of origin. ――You can roughly judge the character of the character from the positive and negative values. --The transition of emotional values for each time series can be easily interpreted by dividing them into positive and negative values on a chapter-by-chapter basis. And this time, I investigated the relationship between the weather and emotional values, which is the easiest to understand, but it seems that interesting results will come out if I also investigate the relationship with other factors. If you have an idea like "It would be interesting to analyze this!", I would appreciate it if you could comment.
Of course, I was prepared, but the range of expression of the novel was wider than I expected, and there was a depiction that it was difficult to give a positive negative on a word-by-word basis. In the final scene, ** "Sail Height" ** shouts "The weather can stay crazy", but it is the most powerful positive </ font> in the story. It's a nice scene, but if you take it literally, it will be negative. There is a limit to just giving emotional values in word units, which is a very difficult place.
#Function to import emotion polarity dictionary
def _make_dict():
import pandas as pd
df_word_dict = pd.read_csv('./dict/edited_target_pair_list_word_out.csv')#noun
df_wago_dict = pd.read_csv('./dict/edited_target_pair_list_wago_out.csv')#Words
df_one_gram_dict = pd.read_csv('./dict/one_gram_dict_out.csv')#Multiple words
word_dict = {}
for pair_list in df_word_dict[['word','count']].values.tolist():
if pair_list[1] !='0':
word_dict[pair_list[0]] = pair_list[1]
wago_dict = {}
for pair_list in df_wago_dict[['word','count']].values.tolist():
if pair_list[1] !='0':
wago_dict[pair_list[0]] = pair_list[1]
one_gram_dict = {}
for pair_list in df_one_gram_dict[['word1','word2','score']].values.tolist():
one_gram_dict[(str(pair_list[0]),str(pair_list[1]))] = pair_list[2]
return word_dict,wago_dict,one_gram_dict
#A function that splits text sentence by sentence
def _split_per_sentence(text):
import re
re_delimiter = re.compile("[。,.!\?!?]")
for sentence in re_delimiter.split(text):
if sentence and not re_delimiter.match(sentence):
yield sentence
def _sorted_second_list(polarities_and_lemmanum):
from operator import itemgetter
sorted_polarities_and_lemmanum=sorted(polarities_and_lemmanum, key=itemgetter(1))
return [i[0] for i in sorted_polarities_and_lemmanum]
def _calc_sentiment_polarity(sentence):
import MeCab
word_dict,wago_dict,one_gram_dict = _make_dict()
NEGATION = ('Absent', 'Zu', 'Nu','Hmm')
tagger = MeCab.Tagger('-Owakati -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd')
tagger.parse('') # for avoiding bug
word_polarities = [] # word_List of emotional values corresponding to dict
wago_polarities = [] # wago_List of emotional values corresponding to dict
polarities_and_lemmanum = [] #List of final emotional values and lemmanum
lemmas = [] #Headword,Words in the form listed in the dictionary
word_polarity_apeared = False
wago_polarity_apeared = False
word_nutoral_polarity_apeared = False
wago_nutoral_polarity_apeared = False
word_out_polarity_apeared = False
wago_out_polarity_apeared = False
word_polarity_word = '' #Provisional for error handling
wago_polarity_word = '' #Provisional for error handling
word_nutoral_word = ''
wago_nutoral_word = ''
word_out_polarity_word = ''
wago_out_polarity_word = ''
last_hinsi = ''
last_word = ''
node = tagger.parseToNode(sentence)
word_score_pair_list = []
lemma_num = 0#For noting the order of words
lemma_dict = {}
while node:
if 'BOS/EOS' not in node.feature:
surface = node.surface
feature = node.feature.split(',')
lemma = feature[6] if feature[6] != '*' else node.surface
lemma_num += 1
lemma_dict[lemma] = lemma_num
#Processing of lemma converted into words in the format listed in the dictionary of divided words
#That word is word_Processing when there is a dict
if word_polarity_apeared and (feature[0] not in ['Auxiliary verb','Particle'] and last_hinsi not in ['Auxiliary verb','Particle'] and last_word not in ['is there','Oru','To do']) and last_word not in word_dict:
word_polarity_apeared = False
elif wago_polarity_apeared and (feature[0] not in ['Auxiliary verb','Particle'] and last_hinsi not in ['Auxiliary verb','Particle'] and last_word not in ['is there','Oru','To do']) and last_word not in wago_dict:
wago_polarity_apeared = False
elif word_nutoral_polarity_apeared and (feature[0] not in ['Auxiliary verb','Particle'] and last_hinsi not in ['Auxiliary verb','Particle'] and last_word not in ['is there','Oru','To do']) and last_word not in word_dict:
word_nutoral_polarity_apeared = False
elif wago_nutoral_polarity_apeared and (feature[0] not in ['Auxiliary verb','Particle'] and last_hinsi not in ['Auxiliary verb','Particle'] and last_word not in ['is there','Oru','To do']) and last_word not in wago_dict:
wago_nutoral_polarity_apeared = False
try:
if word_dict[lemma] in ['p','n']:
polarity = 1 if word_dict[lemma] == 'p' else -1
word_polarities.append([polarity,lemma_dict[lemma]])
word_polarity_apeared = True
word_polarity_word = lemma
elif word_dict[lemma] == 'f':
word_polarities.append([0,lemma_dict[lemma]])
word_nutoral_polarity_apeared = True
word_nutoral_word = lemma
polarity = 0
#word_Since 0 is erased in advance with dict, else processing is unnecessary, but leave it in consideration of readability.
else:
polarity = 0
#word_When there is no word in the dict
except:
#wago_When there is a word in the dict
try:
if wago_dict[lemma] in ['Positive (experience)','Negative (experience)','Positive (evaluation)','Negative (evaluation)']:
polarity = 1 if wago_dict[lemma] in ['Positive (experience)','Positive (evaluation)'] else -1
# print(polarity)
wago_polarities.append([polarity,lemma_dict[lemma]])
wago_polarity_apeared = True
wago_polarity_word = lemma
elif wago_dict[lemma] == 'neutral':
wago_polarities.append([0,lemma_dict[lemma]])
wago_nutoral_polarity_apeared = True
wago_nutoral_word = lemma
polarity = 0
else:
polarity = 0
#word_dict also wago_Processing when there is no word in dict
except:
if word_polarity_apeared and surface in NEGATION and wago_nutoral_polarity_apeared is False:
if last_hinsi in ['noun','Particle','Auxiliary verb','verb','adjective']:
word_polarities[-1][0] *= -1
try:
reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([word_polarity_word,1])+1)
except:
reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([word_polarity_word,-1])+1)
finally:
word_score_pair_list[reverse_num]=[word_polarity_word+'+'+lemma,word_polarities[-1][0]]
word_polarity_apeared = False
word_polarity_word = ''
polarity = 0
else:
polarity = 0
#"Please fix" "Please improve"-Processing to 1
elif lemma in ['Give me','want','Wish']:
try:
if word_polarity_word or word_nutoral_word !='':
last_polarities_word = [i for i in [word_polarity_word,word_nutoral_word] if i !=''][0]
try:
reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([last_polarities_word,word_polarities[-1][0]])+1)
except:
reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([last_polarities_word,word_polarities[-1][0]])+1)
finally:
word_score_pair_list[reverse_num]=[last_polarities_word+'+'+surface,-1]#Emotion value-Grant 1
word_polarity_apeared = False
word_polarity_word = ''
word_polarities[-1][0] = -1
polarity = 0
elif wago_polarity_word or wago_nutoral_word !='':
last_polarities_word = [i for i in [wago_polarity_word,wago_nutoral_word] if i !=''][0]
try:
reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([last_polarities_word,wago_polarities[-1][0]])+1)
except:
reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([last_polarities_word,wago_polarities[-1][0]])+1)
finally:
word_score_pair_list[reverse_num]=[last_polarities_word+'+'+surface,-1]#Emotion value-Grant 1
wago_polarity_apeared = False
wago_polarity_word = ''
wago_polarities[-1][0] = -1
polarity = 0
except:
polarity = 0
#When there is a "ka" at the end of the sentence-Processing to 1
elif last_hinsi in ['Auxiliary verb','Particle'] and lemma == 'Or':
try:
if word_polarity_word or word_nutoral_word !='':
last_polarities_word = [i for i in [word_polarity_word,word_nutoral_word] if i !=''][0]
try:
reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([last_polarities_word,word_polarities[-1][0]])+1)
except:
reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([last_polarities_word,word_polarities[-1][0]])+1)
finally:
word_score_pair_list[reverse_num]=[last_polarities_word+'+'+surface,-1]#Emotion value-Grant 1
word_polarity_apeared = False
word_polarity_word = ''
word_polarities[-1][0] = -1
polarity = 0
elif wago_polarity_word or wago_nutoral_word !='':
last_polarities_word = [i for i in [wago_polarity_word,wago_nutoral_word] if i !=''][0]
try:
reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([last_polarities_word,wago_polarities[-1][0]])+1)
except:
reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([last_polarities_word,wago_polarities[-1][0]])+1)
finally:
word_score_pair_list[reverse_num]=[last_polarities_word+'+'+surface,-1]#Emotion value-Grant 1
wago_polarity_apeared = False
wago_polarity_word = ''
wago_polarities[-1][0] = -1
polarity = 0
except:
polarity = 0
elif word_nutoral_polarity_apeared:
if last_hinsi in ['noun','Particle','Auxiliary verb','verb'] and lemma in NEGATION:
lemma_type = 'denial'
try:
word_polarities[-1][0] += one_gram_dict[(word_nutoral_word,lemma_type)]
reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([word_nutoral_word,0])+1)
word_score_pair_list[reverse_num]=[word_nutoral_word+'+'+lemma,one_gram_dict[(word_nutoral_word,lemma)]]
word_nutoral_polarity_apeared = False
word_nutoral_word = ''
except:
polarity = 0
elif last_hinsi in ['noun','Particle','Auxiliary verb','verb'] and lemma not in NEGATION:
try:
word_polarities[-1][0] += one_gram_dict[(word_nutoral_word,lemma)]
reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([word_nutoral_word,0])+1)
word_score_pair_list[reverse_num]=[word_nutoral_word+'+'+lemma,one_gram_dict[(word_nutoral_word,lemma)]]
word_nutoral_polarity_apeared = False
word_nutoral_word = ''
except:
polarity = 0
#That word is word_What to do if it wasn't in the dict
else:
#The word is,Absent', 'Zu', 'Nu'If so, reverse the polarity of the previous word
if wago_polarity_apeared and surface in NEGATION and wago_nutoral_polarity_apeared is False\
and word_polarity_apeared is False and word_nutoral_polarity_apeared is False:
if last_hinsi in ['noun','adjective','Auxiliary verb']:
wago_polarities[-1][0] *= -1
reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([wago_polarity_word,wago_polarities[-1][0]*(-1)])+1)
word_score_pair_list[reverse_num]=[wago_polarity_word+'+'+lemma,wago_polarities[-1][0]]
wago_polarity_apeared = False
word_polarity_word = ''
polarity = 0
else:
polarity = 0
elif wago_nutoral_polarity_apeared:
#Neutral + negation processing
if last_hinsi in ['verb','Particle','助verb'] and lemma in NEGATION:
lemma_type = 'denial'
try:
lemma_type = 'denial'
wago_polarities[-1][0] += one_gram_dict[(wago_nutoral_word,lemma_type)]
#Processing to trace the list in reverse
reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([wago_nutoral_word,0])+1)
word_score_pair_list[reverse_num]=[wago_nutoral_word+'+'+lemma,one_gram_dict[(wago_nutoral_word,lemma_type)]]
wago_nutoral_polarity_apeared = False
wago_nutoral_word = ''
except:
polarity = 0
#Processing other than neutral + negation
elif last_hinsi in ['noun','verb','adjective','Particle','助verb'] :
try:
wago_polarities[-1][0] += one_gram_dict[(wago_nutoral_word,lemma)]
#Processing to trace the list in reverse
reverse_num = -1*([i for i in reversed(word_score_pair_list)].index([wago_nutoral_word,0])+1)
word_score_pair_list[reverse_num]=[wago_nutoral_word+'+'+lemma,one_gram_dict[(wago_nutoral_word,lemma)]]
wago_nutoral_polarity_apeared = False
wago_nutoral_word = ''
except:
polarity = 0
else:
polarity = 0
else:
polarity = 0
word_score_pair = [lemma,polarity]
word_score_pair_list.append(word_score_pair)
last_hinsi = node.feature.split(',')[0]
last_word = lemma
node = node.next
if word_polarities:
polarities_and_lemmanum.extend(word_polarities)
if wago_polarities:
polarities_and_lemmanum.extend(wago_polarities)
#lemma_Ascending sort by num
try:
polarities = _sorted_second_list(polarities_and_lemmanum)
#Only use polarity values other than 0. If 0 remains, an error will occur in the subsequent processing.
polarities = [i for i in polarities if i !=0]
except:
polarities = []
try:
if sum(polarities) / len(polarities) ==0:
score = float(polarities[-1])
# print('=================================================')
print(sentence+'→→ Priority is given to the emotional value at the end of the sentence')
# print('=================================================')
else:
score = sum(polarities) / len(polarities)
except:
score = 0
if not polarities:
return 0,0,0,word_score_pair_list
return score,sum(i for i in polarities if i > 0),sum(i for i in polarities if i < 0),word_score_pair_list
def _analyze(text):
scores,total_word_score_pair_list,positive_word_cnt_list,negative_word_cnt_list = [],[],[],[]
for sentence in _split_per_sentence(text):
#Replace sentences for sentiment analysis Example: No → No
replaced_sentence = _emotion_replace_text(sentence)
score,positive_word_cnt,negative_word_cnt,word_score_pair_list = _calc_sentiment_polarity(replaced_sentence)
scores.append(score)
positive_word_cnt_list.append(positive_word_cnt)
negative_word_cnt_list.append(negative_word_cnt)
total_word_score_pair_list.append(word_score_pair_list)
return scores,positive_word_cnt_list,negative_word_cnt_list,total_word_score_pair_list
def _flatten_abs1(x):
#A function that flattens a double list and makes it a list of only pairs with emotional values
return [e for inner_list in x for e in inner_list if e[1] !=0]
def score_sum_get(x):
#_flatten_A function that sums the emotional values obtained by abs1
emo_list=[]
for inner_list in x:
for e in inner_list:
if e[1] !=0:
emo_list.append(e[1])
return sum(emo_list)
from datetime import datetime as dt
from datetime import date
#Reading training data
import pandas as pd
path='./data/tenkinoko.csv'
df_tenki = pd.read_csv(path,encoding="SHIFT-JIS")
df_tenki["chapter_flag"] = df_tenki.chapter.apply(chapter_flag)
add_col_name=['scores','positive_word_cnt_list','negative_word_cnt_list','total_word_score_pair_list']
for i in range(len(add_col_name)):
col_name=add_col_name[i]
df_tenki2[col_name]=df_tenki2['text'].apply(lambda x:_analyze(x)[i])
df_tenki2['score_sum']=df_tenki2['scores'].apply(lambda x:sum(x))
#Count the number of positives per word
df_tenki2['sum_positive_scores']=df_tenki2['positive_word_cnt_list'].apply(lambda x:sum(x))
#Count the number of negatives per word
df_tenki2['sum_negative_scores']=df_tenki2['negative_word_cnt_list'].apply(lambda x:sum(x))
#With emotional polarity words[morpheme,Emotion value]Processing to make only pairs of
df_tenki2['total_word_score_pair_list_abs1']=df_tenki2['total_word_score_pair_list'].apply(lambda x:_flatten_abs1(x))
#Calculate the total emotional value
df_tenki2['new_score_sum']=df_tenki2['total_word_score_pair_list'].apply(lambda x:score_sum_get(x))
Recommended Posts