This time I'm going to do a simple scraping. I don't think there are many people who like to collect sound sources locally during the heyday of subscription and even tag the lyrics, composition, and arrangement, but I would like to introduce it because it can be easily tagged.
First is the structure of the Tower Records site. When I looked up the xpath, which is packed with important information, it looked like the following.
//*[@id="RelationArtist_0_1_sub"]/div/div[3]/div[2]/a/text()
This shows the lyrics information for the first song on Disc1. From the previous numbers, Disc Number, Trac Number, lyrics or composition or arrangement. Don't play with the last number.
Let's actually write the code. The libraries used are lxml (scraping), urllib (around the net) and mutagen (music tag related).
tagget.py
om mutagen.flac import FLAC
from urllib import request
import requests
from lxml import html
import os
import requests
import json
class Net():
def Tower(self, no, html2, disc, item):
content = list()
if item=="W": #Judge one of the lyrics, composition and arrangement, and enter the appropriate number.
i = "3"
elif item=="C":
i = "4"
elif item=="A":
i = "5"
contentr = html2.xpath('//*[@id="RelationArtist_'+str(disc)+'_'+str(no)+'_sub"]/div/div['+i+']/div[2]/a/text()') #Specify location
try:
content.append(contentr[0].strip('\'').strip()) #It's not smart, but it corresponds to the case where multiple values are entered
content.append(contentr[1].strip('\'').strip()) #Let's use for or While!
content.append(contentr[2].strip('\'').strip())
content.append(contentr[3].strip('\'').strip())
except IndexError:
print(content) #If the value is no longer entered, an Error will be issued to output what kind of tag was acquired.
return content
class Main():
def Towerget(self,files,url):
n = Net()
r = requests.get(url) #Load the page
html2 = html.fromstring(r.content) #Parse the page
for f in files:
tag = FLAC(f) #Loading tags
no = tag['tracknumber'][0].lstrip("0") #I entered the 1-digit Disc Number as 0x, so I shaped it according to Tower Records.
disc = int(tag['discnumber'][0].lstrip("0")) - 1 #The number representing the disc starts from 0, so adjust it.
print(no)
tag['word'] = n.Tower(no, html2, disc, item="W") #Lyrics tag input
tag['composer'] = n.Tower(no, html2, disc, item='C') #Input composition tag
tag['arranger'] = n.Tower(no, html2, disc, item="A") #Arrangement tag input
tag.pprint()
tag.save() #Save tag
os.chdir("E:\music\Unorganized\Uchikubigokumon Club-Prison fifteen") #The file path of the file to tag
files0 = os.listdir(os.getcwd()) #Get a list of files in a folder
files = list()
for f in files0: #Since the same file contains Google Drive management files, jacket photos, etc., only flac is taken out.
if f.endswith(".flac"):
files.append(f)
print(f)
else:
print("not "+f)
m = Main()
url = "https://tower.jp/item/4936516/Prison fifteen" #The URL of the Tower Records page
m.Towerget(files, url)
It's not a very clean code, but you can get it for the time being.
・ Songs such as Overture that do not have a song and no lyrics are out of sync. ・ Tower Records may not have entered the arrangement. ・ I want to get the URL of the Tower Records page automatically (this seems difficult).