Collect video information of "Singing with XX people" [Python] [Youtube Data API]

I like the videos of the "Singing with XX series", so I suddenly wanted to collect information on the popular videos of the "Singing with XX series" on YouTube. I don't have a specific purpose for it, but I thought it would be fun to analyze various things such as what kind of parody is interesting to many people.

So, this time, I tried to get the video information of "Singing with XX series" using Python and YouTube Data API (v3). To be honest, in terms of collecting video information, there is no particular novelty to the site that I referred to (at most, the difference in the last output information), but since I tried using the YouTube Data API for the first time, that I will write it as a memorandum.

Reference site

environment

Confirmed to work with Google Colaboratory (as of February 23, 2020)

Preparation

Get the API key for the YouTube Data API (v3). The procedure is also available on reference sites, so I will omit it here. There are no particular restrictions on API keys.

code

The video information included in the search results by the specified query is acquired and output as standard. The code collects information on baseball players as an example.

from apiclient.discovery import build # pip install google-api-python-client
import datetime

YOUTUBE_API_KEY = '<Fill in the API key>'

query = 'I tried to sing with the name of a baseball player'
max_pages = 16 #Number of pages to get
maxResults = 50 #The number of search results to include per page. max is 50

#Function to get video information
def search_videos(query, max_pages=10,maxResults=50):
    youtube = build('youtube', 'v3', developerKey = YOUTUBE_API_KEY)

    search_request = youtube.search().list(
        part='id',
        q=query,
        type='video',
        maxResults=maxResults,
    )


    i = 0
    while search_request and i < max_pages:
        search_response = search_request.execute()
        video_ids = [item['id']['videoId'] for item in search_response['items']]

        videos_response = youtube.videos().list(
            part='snippet,statistics',
            id=','.join(video_ids)
        ).execute()

        yield videos_response['items']

        search_request = youtube.search().list_next(search_request, search_response)
        i += 1

#Extract the desired information from the acquired video information and put it in the list
#This time, ID, URL, posting date and time, poster's channel ID, video title, number of views, number of high ratings, number of low ratings, number of favorites are acquired, and the program execution time is also added.
for items_per_page in search_videos(query, max_pages, maxResults):
    for item in items_per_page:
        obj = {}
        obj['id'] = item['id']
        obj['url'] = 'http://youtube.com/watch?v='+obj['id']
        snippet = item['snippet']
        for key in ['publishedAt','channelId','title']:
            obj[key] = snippet[key]
        statistics = item['statistics']
        for key in ['viewCount','likeCount','dislikeCount','favoriteCount','commentCount']:
            obj[key] = statistics[key] if key in statistics else "NA"
        obj['timestamp'] = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
        print(",".join(['"'+obj[v]+'"' for v in obj]))

This time, we got ID, URL, posting date and time, poster's channel ID, video title, number of views, high rating, low rating, favorite count, and comment count. I'm not sure what the "favorite count" is. Since it was included in the data, I got it just in case, but it was all 0.

The final output was standard output (print) and I just copied and pasted it to Google SpreadSheet. It looks like the following is pasted スクリーンショット 2020-02-23 15.15.42.png

When I visually checked the results, it seemed that there was noise such as a video of a baseball broadcast or a series that I tried to sing unrelated, so I need to remove it manually. Also, if you want to acquire more comprehensively, you should execute the program with another search word such as "Yakyuta" and add only those whose video ID does not overlap with the acquisition results so far.

That's it for coding. In the future, I think it would be interesting to transcribe the words used in the parody lyrics and analyze what kind of parody is likely to become popular (it takes too much effort to transcribe, so it will be realized soon. It doesn't seem to be)

Recommended Posts

Collect video information of "Singing with XX people" [Python] [Youtube Data API]
Extract the band information of raster data with python
Get Youtube data with python
YouTube video management with Python 3
Play with YouTube Data API v3 using Google API Python Client
[python] Read information with Redmine API
Collect product information and process data using Rakuten product search API [Python]
[Python] I tried to get various information using YouTube Data API!
Collecting information from Twitter with Python (Twitter API)
Retrieving food data with Amazon API (Python)
Get Youtube data in Python using Youtube Data API
Recommendation of Altair! Data visualization with Python
[Python] Get Python package information with PyPI API
I tried to get the movie information of TMDb API with Python
[Python] Get all comments using Youtube Data API
Get stock price data with Quandl API [Python]
Get CPU information of Raspberry Pi with Python
[Python] Mention to multiple people with Slack API
Challenge principal component analysis of text data with Python
Get comments and subscribers with the YouTube Data API
[Python] Get user information and article information with Qiita API
[Basics of data science] Collecting data from RSS with python
Data analysis with python 2
Data analysis with Python
Try scraping the data of COVID-19 in Tokyo with Python
Notes on handling large amounts of data with python + pandas
Let's touch the API of Netatmo Weather Station with Python. #Python #Netatmo
Get rid of dirty data with Python and regular expressions
The story of rubyist struggling with python :: Dict data with pycall
[Homology] Count the number of holes in data with Python
Get data from analytics API with Google API Client for python
Implement normalization of Python training data preprocessing with scikit-learn [fit_transform]
[Python] I tried collecting data using the API of wikipedia
Basic summary of data manipulation with Python Pandas-First half: Data creation & manipulation
Python: Extract file information from shared drive with Google Drive API
Sample data created with python
Use Trello API with python
Use Twitter API with Python
Get information with zabbix api
Web API with Python + Falcon
Play RocketChat with API / Python
Call the API with python3.
Use subsonic API with python3
Get Alembic information with Python
Read json data with python
Let's automatically collect company information (XBRL data) using the EDINET API (4/10)
Implementation of CRUD using REST API with Python + Django Rest framework + igGrid
Practical exercise of data analysis with Python ~ 2016 New Coder Survey Edition ~
Sample to use after OAuth authentication of BOX API with Python
Basic map information using Python Geotiff conversion of numerical elevation data
Crawling with Python and Twitter API 2-Implementation of user search function
Links to people who are just starting data analysis with python
I tried to automatically collect images of Kanna Hashimoto with Python! !!
PhytoMine-I tried to get the genetic information of plants with Python