Recently, I'm investigating whether various data can be acquired using API. I researched and practiced how to get information such as the number of views and likes of videos using the Youtube Data API, so I wrote it as a memorandum.
I referred to the following when using the Youtube Data API.
-[Youtube video search with Python](https://ossyaritoori.hatenablog.com/entry/2018/01/22/Python%E3%81%A7Youtube%E5%8B%95%E7%94%BB%E6%A4 % 9C% E7% B4% A2) -Overview of YouTube Data API -Youtube Data API Reference
To get the Youtube Data API, you first need a Google account. Follow the steps below to register the Yotube Data API and get the API key.
-Create a new project after accessing Google Cloud Platform --Move from "APIs and Services" to "Dashboard" with the newly created project checked. --From the moved screen, move to "API Library" and from the search screen, "YouTube Data API v3" Search for and move to the next page --Enable the API of "YouTube Data API v3" on the screen after moving --Click the "Create Authentication Information" button on the "Authentication Information" screen to get the API key
Get the library for here. It can be installed with pip as shown below.
pip install google-api-python-client
You are now ready to use the Youtube API.
In the following, we will acquire video information using the Youtube Data API. Please check Youtube Data API Reference for specific information that can be obtained.
Recently, I'm addicted to board games, so I'd like to get board game-related video information in descending order of the number of views.
from apiclient.discovery import build
YOUTUBE_API_KEY = 'Enter your API key'
youtube = build('youtube', 'v3', developerKey=YOUTUBE_API_KEY)
search_response = youtube.search().list(
part='snippet',
#Specify the character string you want to search
q='Board game',
#Obtained in descending order of views
order='viewCount',
type='video',
).execute()
It is possible to get video information in JSON format using the above script. Let's take a look at the information of the most played video of the board game.
search_response['items'][0]
{'kind': 'youtube#searchResult',
'etag': '"p4VTdlkQv3HQeTEaXgvLePAydmU/0dlj0cjWp5akSv64R8VxJM--3Ok"',
'id': {'kind': 'youtube#video', 'videoId': 'ASusE5qjoAg'},
'snippet': {'publishedAt': '2019-05-31T11:58:15.000Z',
'channelId': 'UCutJqz56653xV2wwSvut_hQ',
'title': '[Aim for commercialization] Counter-literature! Gachinko board game making showdown!',
'description': 'I no longer say "Please subscribe to the channel" or "Thank you for your high evaluation" on Tokai OnAir, but I said after all....',
'thumbnails': {'default': {'url': 'https://i.ytimg.com/vi/ASusE5qjoAg/default.jpg',
'width': 120,
'height': 90},
'medium': {'url': 'https://i.ytimg.com/vi/ASusE5qjoAg/mqdefault.jpg',
'width': 320,
'height': 180},
'high': {'url': 'https://i.ytimg.com/vi/ASusE5qjoAg/hqdefault.jpg',
'width': 480,
'height': 360}},
'channelTitle': 'Tokai OnAir',
'liveBroadcastContent': 'none'}}
The video of Tokai OnAir was number one. After all it is very popular ... However, the above script can only get 5 pieces of information at a time, not the specific number of video views.
I created a function that acquires a large number of videos at once, extracts only the necessary information from the return value, and drops it in a data frame. That is here.
#Get the information of the number x 5 in num
#Other parameters are the same as the parameters to get information from API
def get_video_info(part, q, order, type, num):
dic_list = []
search_response = youtube.search().list(part=part,q=q,order=order,type=type)
output = youtube.search().list(part=part,q=q,order=order,type=type).execute()
#Since only 5 items can be acquired at a time, it is executed over and over again.
for i in range(num):
dic_list = dic_list + output['items']
search_response = youtube.search().list_next(search_response, output)
output = search_response.execute()
df = pd.DataFrame(dic_list)
#Get a unique videoId for each video
df1 = pd.DataFrame(list(df['id']))['videoId']
#Get a unique videoId for each video Get only the video information you need
df2 = pd.DataFrame(list(df['snippet']))[['channelTitle','publishedAt','channelId','title','description']]
ddf = pd.concat([df1,df2], axis = 1)
return ddf
Let's execute the above function. This time, I will try to get 100 board game related videos in descending order of the number of views.
get_video_info(part='snippet',q='Board game',order='viewCount',type='video',num = 20)
In this way, we were able to put 100 video information into a data frame.
Then get the number of times the video has been played. You need to use a different method than the one you used earlier. Get the number of times the video has been played and attach it to the data frame you just created.
#Create a function to get the specific number of views and likes of the video by entering the videoId
def get_statistics(id):
statistics = youtube.videos().list(part = 'statistics', id = id).execute()['items'][0]['statistics']
return statistics
df_static = pd.DataFrame(list(df['videoId'].apply(lambda x : get_statistics(x))))
df_output = pd.concat([df,df_static], axis = 1)
df_output
With this kind of feeling, I was able to get the number of views, likes, comments, etc. of the video.
Let's visualize it easily. I tried to graph the cumulative number of views for each channel in the videos within the top 100 views. Click here for the results.
df_output.groupby('channelTitle').sum().sort_values(by = 'viewCount', ascending = False).plot( kind='bar', y = 'viewCount', figsize = (25,10), fontsize = 20)
After all, one popular Youtuber video has an outstanding number of views, so the top is a series of well-known channels. If you graph the number of posted videos for each channel in the top 100 most viewed videos, you will see a different view.
An unfamiliar channel called "Gorgeous Video" came in first. "Gorgeous Video" is like that entertainer's gorgeous Youtube channel. It seems that he is energetically giving videos of board games.
Next You can get various interesting data like this. I would like to play around with it using the Youtube Data API.
Recommended Posts