[Python] Get all comments using Youtube Data API

Introduction

I often look at Youtube, but if you scroll through the comments on the Youtube site, the following data will be acquired to some extent and displayed. If it's a popular video, there are more than 1000 comments, but it's hard to see all of them, so it's about 100 from the top (in order of sorting evaluation) that you actually see.

I was impressed by the video of "[How to win GACKT] Talking about overseas migration, way of life, business, volunteers --Youtube", so please comment I wanted to see them all. So, I thought that if I used the Youtube Data API, I could get all the comments and then see them all at once, so I tried it.

Survey

API key activation

This is my first time using the Youtube Data API. In order to use Youtube Data API, it is necessary to activate the API key, so I referred to the following site. Get comments on Youtube Live using Youtube Data API

Google Colaboratory This time, I wanted to make it easy for anyone to get comments, and I wanted to use "Google Colaboratory", which can be used immediately if I had a Google account, so I wanted to use Python as the programming language.

The following site exactly matched your request. Get YouTube comments in Python

Get comments

I opened Google Colaboratory, pasted "getYouTubeComments.py" from the above site with copy and paste, rewritten the API key and VIDEO ID, and executed it. I was able to get a comment, but when I compare the comments of the target action, something is missing. There is a child comment by reply in the comment, but this program was not able to get the child comment.

Get child comments

How do I get child comments to get? You can still get the number of replies, so I understand that if the number of replies is 1 or more, you should get the child comments. I searched and found the following site, and the program language is PHP, but all I need to know is how it works. Get all YouTube comments-Sleepy porridge

In the case of child comments, it can be obtained by passing the parent ID (parentId) as a parameter to the comments method.

Comment https://www.googleapis.com/youtube/v3/commentThreads
Child comment https://www.googleapis.com/youtube/v3/comments

specification

The difference from the referenced program is that you can get the user name and child comments by assigning serial numbers.

Line breaks in comments are replaced with single-byte spaces at tab delimiters. The format of the comment is html or plain text, but it is plain text. The order is relevance, in descending order of evaluation. Child comments are in no particular order (even if you specify the order, they will not be the same as the video site). Since the parent serial number is 4 digits and the child serial number is 3 digits, if you want to get comments that exceed the number of digits, you should change and increase the number of digits to be displayed.

000X	(comment)(Like number) (User name) (Reply number)
000X-00X	(Child comment)(Good number) (User name)

how to use

Open Google Colaboratory, paste "getYouTubeComments.py" from the above site with copy and paste, rewrite the API key and VIDEO ID, and execute.

Enter API_KEY

The API key will be issued with the API activation authentication information of Youtube Data API, so please rewrite it with the API key issued by entering API_KEY of the program.

Enter Video ID

For example, in the case of "https://www.youtube.com/watch?v=oeJ_b0iG9lM", oeJ_b0iG9lM is the Video ID. So, please rewrite the video ID of the target video with Enter Video ID of the program.

program

getYouTubeComments.py


import requests
import json

URL = 'https://www.googleapis.com/youtube/v3/'
#Enter API KEY here
API_KEY = 'Enter API KEY'
#Enter your Video ID here
VIDEO_ID = 'Enter Video ID'

def print_video_comment(no, video_id, next_page_token):
  params = {
    'key': API_KEY,
    'part': 'snippet',
    'videoId': video_id,
    'order': 'relevance',
    'textFormat': 'plaintext',
    'maxResults': 100,
  }
  if next_page_token is not None:
    params['pageToken'] = next_page_token
  response = requests.get(URL + 'commentThreads', params=params)
  resource = response.json()

  for comment_info in resource['items']:
    #comment
    text = comment_info['snippet']['topLevelComment']['snippet']['textDisplay']
    #Good number
    like_cnt = comment_info['snippet']['topLevelComment']['snippet']['likeCount']
    #Number of replies
    reply_cnt = comment_info['snippet']['totalReplyCount']
    #username
    user_name = comment_info['snippet']['topLevelComment']['snippet']['authorDisplayName']
    # Id
    parentId = comment_info['snippet']['topLevelComment']['id']
    print('{:0=4}\t{}\t{}\t{}\t{}'.format(no, text.replace('\n', ' '), like_cnt, user_name, reply_cnt))
    if reply_cnt > 0:
      cno = 1
      print_video_reply(no, cno, video_id, next_page_token, parentId)
    no = no + 1

  if 'nextPageToken' in resource:
    print_video_comment(no, video_id, resource["nextPageToken"])

def print_video_reply(no, cno, video_id, next_page_token, id):
  params = {
    'key': API_KEY,
    'part': 'snippet',
    'videoId': video_id,
    'textFormat': 'plaintext',
    'maxResults': 50,
    'parentId': id,
  }

  if next_page_token is not None:
    params['pageToken'] = next_page_token
  response = requests.get(URL + 'comments', params=params)
  resource = response.json()

  for comment_info in resource['items']:
    #comment
    text = comment_info['snippet']['textDisplay']
    #Good number
    like_cnt = comment_info['snippet']['likeCount']
    #username
    user_name = comment_info['snippet']['authorDisplayName']

    print('{:0=4}-{:0=3}\t{}\t{}\t{}'.format(no, cno, text.replace('\n', ' '), like_cnt, user_name))
    cno = cno + 1

  if 'nextPageToken' in resource:
    print_video_reply(no, cno, video_id, resource["nextPageToken"], id)

#Get all comments
video_id = VIDEO_ID
no = 1
print_video_comment(no, video_id, None)

Execution result

If you get the video comment of "[How to win GACKT] Talking about overseas migration, way of life, business, volunteers --Youtube", it will be as follows. Become.

Comment acquisition example


0006 Not only the content, but I also studied how to speak and listen. We will do our best to convey to the children that the experience of the challenge is an irreplaceable treasure. 622 A man tried a class 9
0006-001 I am deeply moved to see it in the comment section here. There seems to be a good thing today 0 Akari Hoshino
      ⋮
0006-008 I'm always studying 14 Water.
0006-009 Genuine grass 16
0007 Achan's way of listening is good. I often reply with short words such as "Hey" or "Is it Niigata?", But I'm steadily drawing out the story of GACKT 67 The influence of a drop 0

Finally

Advent Calendar is coming soon. We are looking for articles for "Visual Basic Advent Calendar 2020" this year as well. With this, it seems that you can try it with Visual Basic or Excel instead of Python.

Recommended Posts

[Python] Get all comments using Youtube Data API
Get Youtube data in Python using Youtube Data API
[Python] I tried to get various information using YouTube Data API!
Get Youtube data with python
Get LEAD data using Marketo's REST API in Python
[Python] Get insight data using Google My Business API
Get comments and subscribers with the YouTube Data API
Get Salesforce data using REST API
Data acquisition using python googlemap api
Get Amazon data using Keep API # 1 Get data
Play with YouTube Data API v3 using Google API Python Client
Python> dictionary> values ()> Get All Values by Using values ()
I tried using YOUTUBE Data API V3
Get Google Fit API data in Python
Creating Google Spreadsheet using Python / Google Data API
Data analysis using Python 0
Get image URL using Flickr API in Python
Get stock price data with Quandl API [Python]
Data cleaning using Python
How to get article data using Qiita API
Get comments on youtube Live with [python] and [pytchat]!
Upload videos using YouTube API
Data analysis using python pandas
I tried to search videos using Youtube Data API (beginner)
Get data using Ministry of Internal Affairs and Communications API
Get data from analytics API with Google API Client for python
[Python] I tried collecting data using the API of wikipedia
Get Leap Motion data in Python.
Get data from Quandl in Python
[Python3] Google translate google translate without using api
Try using Pleasant's API (python / FastAPI)
[Python] Get economic data with DataReader
Data acquisition memo using Backlog API
Get upcoming weather from python weather api
Try using Python argparse's action API
Run Ansible from Python using API
Get data from Twitter using Tweepy
How to get followers and followers from python using the Mastodon API
Collect product information and process data using Rakuten product search API [Python]
Get additional data in LDAP with python
Mouse operation using Windows API in Python
Try using the Wunderlist API in Python
[Note] Get data from PostgreSQL with Python
Get Suica balance in Python (using libpafe)
Collect video information of "Singing with XX people" [Python] [Youtube Data API]
Get tweets containing keywords using Python Tweepy
Try using the Kraken API in Python
Retrieving food data with Amazon API (Python)
Get mail using Gmail API in Java
Tweet using the Twitter API in Python
I tried using UnityCloudBuild API from Python
[Python] Various data processing using Numpy arrays
[Python] Get Python package information with PyPI API
Try hitting the YouTube API in Python
FX data collection using OANDA REST API
Awareness of using Aurora Severless Data API
Get data via salesforce API (Bulk API) in Python and load it into BigQuery
Get time series data from k-db.com in Python
Procedure to use TeamGant's WEB API (using python)
Try using the BitFlyer Ligntning API in Python
I tried to get CloudWatch data with Python