For a group of users who are moving comments to a specific channel, acquire which channel each user has registered. Is it possible to estimate the customer base by doing so? Assumption. If there are multiple users who have subscribed to the same channel, it is possible that the customer base is those who are interested in the topics of that channel. (It has not been verified whether the customer base can be estimated, so I will post it somewhere again if possible)
import requests
import pandas as pd
def get_comment_info(channel_id, pageToken):
comment_url = "https://www.googleapis.com/youtube/v3/commentThreads"
param = {
'key': 【Browser Key】
, 'allThreadsRelatedToChannelId': channel_id
, 'part': 'replies, snippet'
#If you don't need to get the reply tree, you don't need replies
, 'maxResults': '50'
, 'pageToken': pageToken
}
req = requests.get(comment_url, params=param)
return req.json()
comment_df = pd.DataFrame([]*3, columns=["video_id", "author_name", "channel_id"])
for comment_thread in req["items"]:
video_id = comment_thread["snippet"]["videoId"]
author_name = comment_thread["snippet"]["topLevelComment"]["snippet"]["authorDisplayName"]
author_channel = comment_thread["snippet"]["topLevelComment"]["snippet"]["authorChannelId"]["value"]
comment_df = comment_df.append(pd.DataFrame([[video_id, author_name, author_channel]], columns=["video_id", "author_name", "channel_id"]))
if "replies" in comment_thread and "comments" in comment_thread["replies"]:
for replies in comment_thread["replies"]["comments"]:
author_name = replies["snippet"]["authorDisplayName"]
author_channel = replies["snippet"]["authorChannelId"]["value"]
comment_df = comment_df.append(pd.DataFrame([[video_id, author_name, author_channel]], columns=["video_id", "author_name", "channel_id"]))
The points are the following two
Also, in the above example, only the channel name and channel_id of the person who commented are acquired, but if you want to acquire the text of the comment, you can acquire it below.
comment_thread["snippet"]["topLevelComment"]["snippet"]["textDisplay"]
replies["snippet"]["textDisplay"]
Please check this document for details or for a slightly different usage. For example, this time we specify channel_id to get comments associated with all videos posted on that channel, but it is also possible to specify only specific videos. https://developers.google.com/youtube/v3/docs/commentThreads?hl=ja
def get_subscription_info(channel_id, pageToken):
subscription_url = 'https://www.googleapis.com/youtube/v3/subscriptions'
param = {
'key': 【Browser Key】
, 'channelId': channel_id
, 'part': 'snippet'
, 'maxResults': '50'
, 'pageToken': pageToken
}
req = requests.get(subscription_url, params=param)
return req.json()
subscription_df = pd.DataFrame([]*3,
columns=["channel_id", "subscript_channel_name", "subscript_channel_id"])
#Passing an empty string to pageToken is the same as not specifying anything
# 1.If you are running the code you got in_id_list = comment_df["channel_id"].unique()become
for channel_id in channel_id_list:
pageToken = ""
while True:
req = get_subscription_info(channel_id, pageToken)
if "items" in req:
for item in req["items"]:
subscript_channel_name = item["snippet"]["title"]
subscript_channel_id = item["snippet"]["resourceId"]["channelId"]
subscription_df = subscription_df.append(
pd.DataFrame(
[[channel_id, subscript_channel_name, subscript_channel_id]]
, columns=["channel_id", "subscript_channel_name", "subscript_channel_id"]
)
)
#If the number of remaining items exceeds maxResults, nextPageToken will be returned.
if "nextPageToken" in req:
pageToken = req["nextPageToken"]
print(channel_id, pageToken)
else:
break
Use the Subscriptions API with Japanese translation as a subcategory https://developers.google.com/youtube/v3/docs/subscriptions?hl=ja
As for how to use it, I explained how to get videoId from playlistId before, but the atmosphere is similar to that. https://qiita.com/miyatsuki/items/c221b48830db2b0a9eba#12-%E3%83%97%E3%83%AC%E3%82%A4%E3%83%AA%E3%82%B9%E3%83%88id%E3%81%8B%E3%82%89%E5%8B%95%E7%94%BB%E3%81%AEid%E3%82%92%E5%8F%96%E5%BE%97
In particular,
Recommended Posts