I like the videos of the "Singing with XX series", so I suddenly wanted to collect information on the popular videos of the "Singing with XX series" on YouTube. I don't have a specific purpose for it, but I thought it would be fun to analyze various things such as what kind of parody is interesting to many people.
So, this time, I tried to get the video information of "Singing with XX series" using Python and YouTube Data API (v3). To be honest, in terms of collecting video information, there is no particular novelty to the site that I referred to (at most, the difference in the last output information), but since I tried using the YouTube Data API for the first time, that I will write it as a memorandum.
Confirmed to work with Google Colaboratory (as of February 23, 2020)
Get the API key for the YouTube Data API (v3). The procedure is also available on reference sites, so I will omit it here. There are no particular restrictions on API keys.
The video information included in the search results by the specified query is acquired and output as standard. The code collects information on baseball players as an example.
from apiclient.discovery import build # pip install google-api-python-client
import datetime
YOUTUBE_API_KEY = '<Fill in the API key>'
query = 'I tried to sing with the name of a baseball player'
max_pages = 16 #Number of pages to get
maxResults = 50 #The number of search results to include per page. max is 50
#Function to get video information
def search_videos(query, max_pages=10,maxResults=50):
youtube = build('youtube', 'v3', developerKey = YOUTUBE_API_KEY)
search_request = youtube.search().list(
part='id',
q=query,
type='video',
maxResults=maxResults,
)
i = 0
while search_request and i < max_pages:
search_response = search_request.execute()
video_ids = [item['id']['videoId'] for item in search_response['items']]
videos_response = youtube.videos().list(
part='snippet,statistics',
id=','.join(video_ids)
).execute()
yield videos_response['items']
search_request = youtube.search().list_next(search_request, search_response)
i += 1
#Extract the desired information from the acquired video information and put it in the list
#This time, ID, URL, posting date and time, poster's channel ID, video title, number of views, number of high ratings, number of low ratings, number of favorites are acquired, and the program execution time is also added.
for items_per_page in search_videos(query, max_pages, maxResults):
for item in items_per_page:
obj = {}
obj['id'] = item['id']
obj['url'] = 'http://youtube.com/watch?v='+obj['id']
snippet = item['snippet']
for key in ['publishedAt','channelId','title']:
obj[key] = snippet[key]
statistics = item['statistics']
for key in ['viewCount','likeCount','dislikeCount','favoriteCount','commentCount']:
obj[key] = statistics[key] if key in statistics else "NA"
obj['timestamp'] = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print(",".join(['"'+obj[v]+'"' for v in obj]))
This time, we got ID, URL, posting date and time, poster's channel ID, video title, number of views, high rating, low rating, favorite count, and comment count. I'm not sure what the "favorite count" is. Since it was included in the data, I got it just in case, but it was all 0.
The final output was standard output (print) and I just copied and pasted it to Google SpreadSheet. It looks like the following is pasted
When I visually checked the results, it seemed that there was noise such as a video of a baseball broadcast or a series that I tried to sing unrelated, so I need to remove it manually. Also, if you want to acquire more comprehensively, you should execute the program with another search word such as "Yakyuta" and add only those whose video ID does not overlap with the acquisition results so far.
That's it for coding. In the future, I think it would be interesting to transcribe the words used in the parody lyrics and analyze what kind of parody is likely to become popular (it takes too much effort to transcribe, so it will be realized soon. It doesn't seem to be)
Recommended Posts