Have you ever been concerned about ** AV titles **?
At that moment I had a question.
"The name of the AV work represents the characteristics of an AV actress, right?" "If so, I think you can tell your AV habit from its characteristics."
If you think so, take action! Let's do it
This time, we will prove the hypothesis using a method called ** word cloud **. (I would like to ask my favorite ** Mia Nanasawa ** to cooperate.)
A "word cloud" is a single picture of words that frequently appear in a sentence. It's one of the quickest and easiest ways to get a feel for a sentence because you can visually see what it's like.
import requests #Library to get web pages
from bs4 import BeautifulSoup #A library that can read and operate tags from the acquired HTML data
url = "https://ja.wikipedia.org/wiki/%E4%B8%83%E6%B2%A2%E3%81%BF%E3%81%82" #Mia Nanasawa's wiki URL
response = requests.get(url)
response.encoding = response.apparent_encoding #response.apparent_SHIFT, which is the correct character code for encoding_JIS is stored(You can prevent garbled characters)
soup = BeautifulSoup(response.text, "html.parser") #BeautifulSoup(HTML to be parsed/XML,Parser to use(Parser))
#HTML can be indented
print(soup.prettify())
I was able to get the HTML correctly.
span_list1=soup.findAll("td")
titles=[]
for i in span_list1:
tmp=i.find("b")
if tmp==None:
continue
else:
print(tmp.text)
titles.append(tmp.text)
The above output contains elements that are not needed for this analysis, such as the "!" Mark and the "-" mark, so we will remove them from now on.
import re
changed_titles1=[]
for i in titles:
tmp=re.sub("!","",i)
tmp=re.sub(" ","",tmp)
tmp=re.sub("!","",tmp)
tmp=re.sub("!!","",tmp)
tmp=re.sub("〜","",tmp)
tmp=re.sub("~","",tmp)
tmp=re.sub("-","",tmp)
tmp=re.sub("・","",tmp)
tmp=re.sub("「","",tmp)
tmp=re.sub("」","",tmp)
tmp=re.sub("Nanasawa Mia","",tmp)
if tmp=="":
continue
else:
changed_titles1.append(tmp)
changed_titles1
Now you have removed the unnecessary characters. From here, we will start morphological analysis.
import MeCab
changed_titles2=''.join(changed_titles1) #Must be a string from the list
text = changed_titles2
m = MeCab.Tagger("-Ochasen")#Tagger instance creation for parsing text
#I will try to remove only the nouns
nouns = [line for line in m.parse(text).splitlines()#Using the parse method of the Tagger class returns the result of morphological analysis of the text
if "noun" in line.split()[-1]]
for str in nouns:
print(str.split())
nouns = [line.split()[0] for line in m.parse(text).splitlines()
if "noun" in line.split()[-1]]
print(nouns)
from wordcloud import WordCloud
import matplotlib.pyplot as plt
text_new=""
for i in nouns:
text_new = text_new + " " + i
word_cloud=WordCloud(background_color='white',font_path=r"C:\Users\tomoh\Machine learning able\Word cloud\meiryo.ttc",min_font_size=5,prefer_horizontal=1)
word_cloud.generate(text_new)
plt.figure(figsize=(10,8))
plt.imshow(word_cloud)
plt.axis("off")
plt.show()
It can be seen that the above results represent the characteristics of Mia Nanasawa ** correctly **.
This is because I have the experience of watching Mia Nanasawa's videos without missing a single one. (I'm sorry for my experience.)
Looking back, ** ・ Tsundere ** ** ・ Provocation ** ** ・ Women's College ** I felt something that attracted me a lot.
If I had a girlfriend, I wish I had these three points ...
** Shoko Takahashi ** is a famous actress who made her debut in the gravure world. From this result, you can read not only the feature of "idol, gravure" but also the feature of ** older S-ki ** from the word ** "boss, older sister" **.
** Recommended for those with M temperament who have a desire to get angry **.
** Yua Mikami ** is a popular actress who belongs to SKE. From this result, not only the characteristic of "idol" but also the characteristic of ** luxury soap lady ** can be read from the word ** "luxury, big breasts, soap" **.
It's recommended for those who don't have money but want to taste high-class soap **.
** Sakura Miura ** is an actress who was taken care of before she fell in love with Mia Nanasawa. From this result, we can read the characteristics of ** "boobs, big breasts, sober" **. Probably, I think that it is recommended for ** those who like Aniota's sober busty women **.
From the above results, I found from WordCloud that I like ** "a sober, big-breasted, tsundere-minded female college student" **.
That may very well be right
In terms of "big breasts", Shoko Takahashi and Yua Mikami agree, but Since there are more opportunities to watch videos of Mia Nanasawa and Sakura Miura than that, ** This hypothesis is proof. ** **
Please give it a try.
Recommended Posts