It's only one year left until Arashi's activity is suspended. It's been 20 years since the appearance of the invisibility costume. What did the national idols who are active in multiplayer want to tell their fans in the 20 years since their formation? I'd like to meet you in person, but that's not the case. So I decided to "visualize the lyrics" and convey the message I want to convey to the fans ~~ the sixth member ~~ I will convey to Arashi fans.
-Python 3.7.3 ・ Windows10
"・ Utane t" (hps: // ww. Utane t. This m) ・ I tried to visualize the lyrics of Kenshi Yonezu with WordCloud.
scraping_arashi.py
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
#Create a table to store scraped data
list_df = pd.DataFrame(columns=['lyrics'])
for page in range(1, 3):
#Song page top address
base_url = 'https://www.uta-net.com'
#Lyrics list page
url = 'https://www.uta-net.com/artist/3891/0/' + str(page) + '/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
links = soup.find_all('td', class_='side td1')
for link in links:
a = base_url + (link.a.get('href'))
#Lyrics detail page
response = requests.get(a)
soup = BeautifulSoup(response.text, 'lxml')
song_lyrics = soup.find('div', itemprop='lyrics')
song_lyric = song_lyrics.text
song_lyric = song_lyric.replace('\n','')
#Wait 1 second to not load the server
time.sleep(1)
#Add the acquired lyrics to the table
tmp_se = pd.DataFrame([song_lyric], index=list_df.columns).T
list_df = list_df.append(tmp_se)
print(list_df)
#csv save
list_df.to_csv('list.csv', mode = 'a', encoding='cp932')
morphological_analysis_arashi.py
from janome.tokenizer import Tokenizer
import pandas as pd
import re
#list.Read csv file
df_file = pd.read_csv('list.csv', encoding='cp932')
song_lyrics = df_file['lyrics'].tolist()
t = Tokenizer()
results = []
for s in song_lyrics:
tokens = t.tokenize(s)
r = []
for tok in tokens:
if tok.base_form == '*':
word = tok.surface
else:
word = tok.base_form
ps = tok.part_of_speech
hinshi = ps.split(',')[0]
if hinshi in ['noun', 'adjective', 'verb', 'adverb']:
r.append(word)
rl = (' '.join(r)).strip()
results.append(rl)
#Replacement of extra character code
result = [i.replace('\u3000','') for i in results]
print(result)
text_file = 'wakati_list.txt'
with open(text_file, 'w', encoding='utf-8') as fp:
fp.write("\n".join(result))
wordcloud_arashi.py
from wordcloud import WordCloud
text_file = open('wakati_list.txt', encoding='utf-8')
text = text_file.read()
#Japanese font path
fpath = 'C:/Windows/Fonts/YuGothM.ttc'
#Word removal that seems meaningless
stop_words = ['so', 'Absent', 'Is', 'To do', 'As it is', 'Yo', 'Teru', 'Become', 'thing', 'Already', 'Good', 'is there', 'go', 'To be']
wordcloud = WordCloud(background_color='white',
font_path=fpath, width=800, height=600, stopwords=set(stop_words)).generate(text)
#The image is wordcloud.Save png in the same directory as the py file
wordcloud.to_file('./wordcloud.png')
↓ ↓ How about the result ↓ ↓
It feels good!
By visualizing the lyrics, I found that words such as "future," "us," "here," and "see" that feel the warmth of Arashi frequently appear (* ´ ▽ ` *).
Let's walk toward the future with us. And I'll be by your side all the time. One year left until the activity is suspended, it will cause A / RA / SHI whirlwind all over Japan (~~ Message from me, the sixth member. ~~)
Fans can convey Arashi's feelings without me saying it, right?
"We" Arashi fans will support Arashi with all their might until the end. Good luck ARASHI. And if it pops, Yea!
I enjoyed learning about scraping, morphological analysis, and how to use WordCloud based on Arashi songs. It's been a long time, but thank you for reading this far. If you find any mistakes, I would be very grateful if you could point them out in the comments.
Recommended Posts