Beginning with psychology, humans have analyzed their emotions.
Among them, in negative / positive analysis, mainly people's remarks and ideas Analyze whether it is positive (positive) or backward (negative).
Negative / positive analysis can be said to be a type of technique called "sentiment analysis".
This extracts expressions related to evaluation and emotions contained in sentences, etc. It refers to technology that analyzes emotions in sentences.
Negative / positive analysis methods include word-by-word classification using a polar dictionary and deep learning.
negative/Positives are called "polarity"
A polarity dictionary is a collection of words with polarity.
The polarity dictionary called PN Table did not manually polarize a large number of words. It is made by assigning points from -1 to +1 to highly relevant words based on words with a small amount of polarity information.
In addition, there is a "Japanese Evaluation Polar Dictionary" published on the page of Inui-Okazaki Laboratory of Tohoku University. This is by tagging it as neutral in addition to negative and positive The polar balance of the words contained in the dictionary is balanced.
There is also a "Polar Phrase Dictionary" created by Yahoo! JAPAN Laboratories.
#Outputs PNTable.
import pandas as pd
pn_df = pd.read_csv('./6020_negative_positive_data/data/pn_ja.dic',\
sep=':',
encoding='utf-8',
names=('Word','Reading','POS', 'PN')
)
print (pn_df)
Morphological analysis is the work of dividing a sentence into words that are the smallest unit.
By performing morphological analysis, you can find the word corresponding to the polarity dictionary. This time, we will perform morphological analysis using MeCab and change the text into a form that is easy to read.
import MeCab
mecab = MeCab.Tagger('')
title = open('./6020_negative_positive_data/data/aidokushono_insho.txt')
file = title.read()
title.close()
print(mecab.parse(file))
First, list the analysis results to make other processing easier.
When performing morphological analysis with MeCab, the last line is "blank" and the penultimate line is "EOS". Since those two lines are not used, we will delete them.
In each line of the analysis result, the word is followed by a tab, and other information is separated by a comma.
import MeCab
import pandas as pd
import re
mecab = MeCab.Tagger('')
title = open('./6020_negative_positive_data/data/aidokushono_insho.txt')
file = title.read()
title.close()
def get_diclist(file):
parsed = mecab.parse(file)
#Separate the analysis result by line break
lines = parsed.split('\n')
#Create a new list with the last two lines removed
lines = lines[0:-2]
#Create a list of analysis results
diclist = []
for word in lines:
#Create data separated by tabs and commas
data = re.split('\t|,',word)
datalist = {'BaseForm':data[7]}
diclist.append(datalist)
return(diclist)
wordlist = get_diclist(file)
print(wordlist)
Read the polarity dictionary (PN Table) You can give polarity to the words that appear by comparing them with the list of analysis results.
Create a dictionary of words and polarity values only from PNTable. Creates a new list of words and polarity values that exist in the new PNTable.
import pandas as pd
#Read the dictionary
pn_df = pd.read_csv('./6020_negative_positive_data/data/pn_ja.dic',\
sep=':',
encoding='utf-8',
names=('Word','Reading','POS', 'PN')
)
#Change PNTable to dict type with only words and polarity values
word_list = list(pn_df['Word'])
pn_list = list(pn_df['PN'])
pn_dict = dict(zip(word_list, pn_list))
#Extract the words that exist in the PN Table from the list of analysis results
def add_pnvalue(diclist_old):
diclist_new = []
for word in diclist_old:
baseword = word['BaseForm']
if baseword in pn_dict:
#Add the polarity value and its word if it exists in the PNTable
pn = float(pn_dict[baseword])
else:
#If it does not exist, specify not found
pn = 'notfound'
word['PN'] = pn
diclist_new.append(word)
return(diclist_new)
wordlist = get_diclist(file) #1.2.This is the function used in 3.
pn_list = add_pnvalue(wordlist)
print(pn_list)
import re
import csv
import time
import pandas as pd
import matplotlib.pyplot as plt
import MeCab
import random
%matplotlib inline
#Read file
title = open('./6020_negative_positive_data/data/aidokushono_insho.txt')
file = title.read()
title.close()
#Create MeCab instance
mecab = MeCab.Tagger('')
#Reading the dictionary
pn_df = pd.read_csv('./6020_negative_positive_data/data/pn_ja.dic',\
sep=':',
encoding='utf-8',
names=('Word','Reading','POS', 'PN')
)
def get_diclist(file):
parsed = mecab.parse(file)
lines = parsed.split('\n')
lines = lines[0:-2]
diclist = []
for word in lines:
l = re.split('\t|,',word)
d = {'BaseForm':l[7]}
diclist.append(d)
return(diclist)
word_list = list(pn_df['Word'])
pn_list = list(pn_df['PN'])
pn_dict = dict(zip(word_list, pn_list))
def add_pnvalue(diclist_old):
diclist_new = []
for word in diclist_old:
base = word['BaseForm']
if base in pn_dict:
pn = float(pn_dict[base])
else:
pn = 'notfound'
word['PN'] = pn
diclist_new.append(word)
pn_point = []
for word in diclist_new:
pn = word['PN']
if pn != 'notfound':
pn_point.append(pn)
return(pn_point)
wordlist = get_diclist(file)
pn_list = add_pnvalue(wordlist)
plt.plot(pn_list)
plt.title(title)
plt.show
Recommended Posts