The other day, I participated in a Python study session sponsored by Team Zet Co., Ltd. The theme this time is "text emotions using word2vec" "Analysis". To be honest, it was a crazy theme for me who first touched Python a week ago, but I wonder if it's possible to experience how the grammar I'm studying is being put to good use. I rushed in and boarded one day before the event.

By the way, I will leave the introduction to this extent and get into the main subject.

What is Word2Vec in the first place? ?? ??

Neural network model (machine learning) that analyzes words. Simply put, it seems that words can be vectorized and weighted. (For more information, refer to here)

This time White Goat Corporation I used the word2vec model of.

How to install word2vec is here

First of all Make sure the words are vectorized by word2vec. Let's type this code with word2vec implemented.

`sample.py`


import gensim.models.word2vec.Word2Vec as wv 

print(len(model.wv["Love"]))
model.wv["Love"]

Then

50 array([ 0.09289702, -0.16302316, -0.08176763, -0.29827002, 0.05170078, 0.07736144, -0.06452437, 0.19822665, -0.11941547, -0.11159643, 0.03224859, 0.03042056, -0.09065174, -0.1677992 , -0.19054233, 0.10354111, 0.02630192, -0.06666993, -0.06296805, 0.00500843, 0.26934028, 0.05273635, 0.0192258 , 0.2924312 , -0.23919497, 0.02317964, -0.21278766, -0.01392282, 0.24962738, 0.11264788, 0.05772769, 0.20941015, -0.01239212, -0.1256235 , -0.19794041, 0.1267719 , -0.12306885, 0.01006295, 0.08548331, -0.08936502, -0.05429656, -0.09757583, 0.10338967, 0.13714872, 0.23966707, 0.02216845, 0.02270923, 0.32569838, -0.0311841 , -0.00150117], dtype=float32)

The result will be returned. This is the word "love" made up of 50 dimensions. It shows that the component is composed of the above elements.


#Extract words similar to keyword
sim_do = model.wv.most_similar(positive=["Girlfriend"], topn=30)
#Since it is listed, it is shaped for easy viewing
print(*[" ".join([v, str("{:.5f}".format(s))]) for v, s in sim_do], sep="\n")

When you hit Herself 0.82959 Molly 0.82547 He 0.82406 Sylvia 0.80452 Charlie 0.80336 Lover 0.80197 You can extract words with similar meanings such as. The number to the right of the word is a quantification of how much you are with the word "she".

Also, when you want to know how long the two words are

similarity = model.wv.similarity(w1="Apple", w2="Strawberry")
print(similarity)

similarity = model.wv.similarity(w1="Apple", w2="Aomori")
print(similarity)

similarity = model.wv.similarity(w1="Apple", w2="Anpanman")
print(similarity)

Then 0.79041845 0.30861858 0.45321244 Will be returned. We quantified how similar the words w1 and w2 are. If you say apples, Aomori! I think that many people associate it with, but since I have decided that Anpanman is more similar than Aomori, I understand that this model is not perfect yet.

Well, here

"King"-"Man" + "Woman" = "Queen" ???

I will consider the famous proposition.

sample.py

sim_do = model.wv.most_similar(positive = ["King", "Female"], negative=["male"], topn=5) print(*[" ".join([v, str("{:.5f}".format(s))]) for v, s in sim_do], sep="\n") #Words in positive compare the degree of similarity, words in negative compare the degree of dissimilarity

Result is…

Princess 0.85313 Bride 0.83918 Beast 0.83155 Witch 0.82982 Maiden 0.82356

I got a similar answer, though it didn't exactly match the "queen".

By the way, we have compared only words so far, but it is also possible to quantify what kind of emotions a sentence contains.

sample.py

import numpy as np t = Tokenizer() s = ' # Enter your favorite sentences. ' output_data=[] x = np.empty((0,4), float) for token in t.tokenize(s): if token.part_of_speech.split(',')[0]=="noun" or token.part_of_speech.split(',')[0]=="adjective": print(token.surface) similarity1 = model.wv.similarity(w1=token.surface, w2="happy") #print("joy:{0}".format(similarity1)) similarity2 = model.wv.similarity(w1=token.surface, w2="pleasant") #print("sorrow:{0}".format(similarity2)) similarity3 = model.wv.similarity(w1=token.surface, w2="sad") #print("anxiety:{0}".format(similarity3)) similarity4 = model.wv.similarity(w1=token.surface, w2="excitement") #print("Interest:{0}".format(similarity4)) x = np.append(x, np.array([[similarity1, similarity2, similarity3, similarity4]]), axis=0) print("-"*30) print(np.mean(x, axis=0)) print("Happy:{0}".format(np.mean(x, axis=0)[0])) print("easy:{0}".format(np.mean(x, axis=0)[1])) print("Sadness:{0}".format(np.mean(x, axis=0)[2])) print("Xing:{0}".format(np.mean(x, axis=0)[3]))

Enter your favorite sentence in the variable s As an example "I proposed at a restaurant with a view of the night view." Let's put in a romantic sentence Result is

Night view Restaurant propose

[0.29473324 0.44027831 0.27123818 0.20060815]

Happy: 0.29473323623339337 Easy: 0.4402783115704854 Sad: 0.27123818174004555 Xing: 0.20060815351704755

Will come out. So this system "I proposed at a restaurant with a view of the night view." Is judged to be a "fun" sentence. (The larger the number, the stronger the emotion)

Then another example "A pistol murder occurred in a prison at midnight." Let's put in a very negative aura punpun sentence Then

Midnight prison Handgun murder Incident

[-0.00661952 0.01671012 0.12141706 0.23172273] Happy: -0.0006619524117559195 Fun: 0.01671011543367058 Sad: 0.12141705807298422 Excitement: 0.2317227303981781

As a result, In fact, the value may take a negative value. Certainly, I don't feel happy even a millimeter.

Impressions

It's a wonderful time to be able to analyze the sentiment of sentences so easily. I am deeply grateful to Team Zet for giving me such a useful learning place.

Recommended Posts
Sentiment analysis with Python (word2vec)

Data analysis with python 2

Voice analysis with python

Voice analysis with python

Data analysis with Python

[Python] Morphological analysis with MeCab

[Co-occurrence analysis] Easy co-occurrence analysis with Python! [Python]

Planar skeleton analysis with Python

Japanese morphological analysis with Python

Muscle jerk analysis with Python

Text sentiment analysis with ML-Ask

Python2 + word2vec

Impedance analysis (EIS) with python [impedance.py]

Text mining with Python ① Morphological analysis

Data analysis starting with python (data visualization 1)

Logistic regression analysis Self-made with python

Data analysis starting with python (data visualization 2)

3. Natural language processing with Python 5-1. Concept of sentiment analysis [AFINN-111]

FizzBuzz with Python3

Scraping with Python

Statistics with python

[In-Database Python Analysis Tutorial with SQL Server 2017]

Marketing analysis with Python ① Customer analysis (decyl analysis, RFM analysis)

Two-dimensional saturated-unsaturated osmotic flow analysis with Python

Scraping with Python

Data analysis python

Word2Vec with BoUoW

Machine learning with python (2) Simple regression analysis

Twilio with Python

Integrate with Python

Play with 2016-Python

AES256 with python

Tested with Python

python starts with ()

Sentiment analysis of tweets with deep learning

with syntax (Python)

Tweet analysis with Python, Mecab and CaboCha

Principal component analysis with Power BI + Python

Bingo with python

Zundokokiyoshi with python

Data analysis starting with python (data preprocessing-machine learning)

Two-dimensional unsteady heat conduction analysis with Python

Python: Simplified morphological analysis with regular expressions

Excel with Python

Microcomputer with Python

Cast with python

[Python] I introduced Word2Vec and played with it.

[Various image analysis with plotly] Dynamic visualization with plotly [python, image]

Medical image analysis with Python 1 (Read MRI image with SimpleITK)

Use Python and word2vec (learned) with Azure Databricks

Static analysis of Python code with GitLab CI

Easy Lasso regression analysis with Python (no theory)

Two-dimensional elastic skeleton geometric nonlinear analysis with Python

Serial communication with Python

Django 1.11 started with Python3.6

Primality test with Python

Python with eclipse + PyDev.

Socket communication with Python

Python: Time Series Analysis

Scraping with Python (preparation)

Try scraping with Python.