I tried using YOUTUBE Data API V3


I tried using youtube Data API V3 in the scraping primer. I'm stuck in various ways, so I'll leave a trail.

Implemented code


import os
import logging
import csv

from apiclient.discovery import build
from pymongo import MongoClient, ReplaceOne, DESCENDING
from typing import Iterator, List

from pymongo.collection import Collection



def main():

    mongo_client = MongoClient('localhost', 27017)
    collections = mongo_client.youtube.videos

    query = input('Please specify the search value.:')

    for items_per_page in  search_videos(query):
        save_to_momgo(collections, items_per_page)


def search_videos(query: str, max_pages: int=5):

    youtube = build('youtube', 'v3', developerKey=YOUTUBE_API_KEY)

    search_request = youtube.search().list(

    i = 0
    while search_request and i < max_pages:
        search_response = search_request.execute()

        video_ids = [item['id']['videoId'] for item in search_response['items']]
        video_response = youtube.videos().list(

        yield video_response['items']

        search_requst = youtube.search().list_next(search_request, search_response)
        i += 1

def save_to_momgo(collection: Collection, items: List[dict]):

    for item in items:
        item['_id'] = item['id']

        for key, value in item['statistics'].items():
            item['statistics'][key] = int(value)

    operation = [ReplaceOne({'_id': item['_id']}, item, upsert=True) for item in items]
    result = collection.bulk_write(operation)
    logging.info(f'Upserted {result.upserted_count} documents.')

def save_to_csv(collection: Collection):

    with open('top_videos_list.csv', 'w',newline='', encoding='utf-8-sig') as f:
        writer = csv.DictWriter(f, ['title', 'viewCount'])
        for item in collection.find().sort('statistics.viewCount', DESCENDING):
            writer.writerows([{'title' : item['snippet']['title'], 'viewCount': item['statistics']['viewCount']}])

if __name__ == '__main__':

Clog ①

After implementing the code and executing it several times, the following error occurs.


googleapiclient.errors.HttpError: <HttpError 403 when requesting https://www.googleapis.com/youtube/v3/videos?part=snippet%2Cstatistics&id=wXeR58bjCak%2CujxXyCrDnU0%2CQhhUVI0sxCc%2CKZz-7KSMjZA%2CWq-WeVQoE6U%2CG-qWwfG9mBE%2CYqUwEPSZQGQ%2CIuopzT_TWPQ%2CmcFspy1WhL8%2Ck9dcl7F6IFY%2C--Z5cvZ4JEw%2C3hidJgc9Zyw%2CdYSmEkcM_8s%2Ch6Hc4RuK8D8%2CRQfN2re3u4w&key=<YOUTUBE_API_KEY>&alt=json returned "The request cannot be completed because you have exceeded your <a href="/youtube/v3/getting-started#quota">quota</a>.">

At first I wasn't sure, so I searched for what was the cause.

As a result, it seems that the cause was simply the restriction on the use of "quota". Create a new project from GCP and issue a new API key from the Authentication tab. Set the key to the environment variable below and run the python file again.


As a result, it was executed without any problem and a CSV file was output.

(scraping3.7) C:\Users\user\scraping3.7\files>python  save_youtube_videos_matadata.py
Please specify the search value.:Hikakin
INFO:root:Upserted 15 documents.
INFO:root:Upserted 0 documents.
INFO:root:Upserted 0 documents.
INFO:root:Upserted 0 documents.
INFO:root:Upserted 0 documents.


Clog ②

Of the above code, the following are not very clear. I couldn't help but stop developing.

operation = [ReplaceOne({'_id': item['_id']}, item, upsert=True) for item in items]

As a result of official investigation, it was found that the argument is received in the following format. Official

ReplaceOne(filter, replacement, options)

In the following cases

ReplaceOne({'city': 'Tokyo'}, {'city':'Gunma'}, upsert=True)

If data of'city':'Tokyo' exists, update to city':'Gunma'. If'city':'Tokyo' does not exist, insert a new'city':'Gunma'.

Reference book

amazon Python Crawling & Scraping [Augmented and Revised Edition] -Practical Development Guide for Data Collection and Analysis

