A story about a Python beginner trying to get Google search results using the API

Overview

I'm a beginner with Python for 2 weeks, but I want to get Google search results for my seminar research, so this article "[Get Google search results using Custom Search API](https: // qiita) .com / zak_y / items / 42ca0f1ea14f7046108c # 1-api% E3% 82% AD% E3% 83% BC% E3% 81% AE% E5% 8F% 96% E5% BE% 97) ” ..

Although it overlaps with the reference article, I would like to publish how it was made.

environment Windows10 python3.7 Anaconda Navigator

** Target ** Obtained previous research on the seminar research theme "What are the determinants that influence the increase and decrease in the number of foreign visitors to Japan?" → Create a file that lists the titles and URLs of the acquired articles

procedure

Get the API key
Enable Custom Search API
Get Custom Search Engine
Library installation
Get Google search results with API & save as json file 6.5 Save the file obtained in 5 in formatting & tvs format

1. Get the API key

Open the navigation menu of Google Cloud Platform and click "APIs and Services" → "Credentials". API認証.png

Create an API key from "Create Credentials". APIキー.png

I will use the obtained API key later, so copy it and paste it somewhere.

2. Enable Custom Search API

Open the navigation menu of Google Cloud Platform and click "APIs and Services" → "Library". APIメニュー.png

Select "Custom Search API" from "Other" at the bottom of the page to open the details page. Click "Activate". API とサービス - spreadsheet-test - Google Cloud Platform.png

3. Get Custom Search Engine

① Go to the Custom Search Engine page and click "Add".

カスタム検索 - 検索エンジンの編集.png

② ・ Enter the URL of some site under "Site to search" (anything is fine) ・ Language is set to "Japanese" ・ Enter the name of the search engine ・ Click "Create" カスタム検索エンジン設定.png

③ Select the name of the search engine you created earlier from the options under "Edit search engine" and edit it. カスタム検索エンジン設定２.png What is this page -Copy the "search engine ID" and paste it somewhere and save it. ・ Select Japanese for "Language" -Delete the site displayed in "Sites to search" ・ Turn on "Search the entire web" ・ Click "Update"

4. Library installation

Install "Google API Python Client" by referring to "Google API Client Library for Python".

I have created a virtual environment with virtualenv and then installed the library.

5. Get with API & save as json file

Now write the code and run it ... then an error occurs!

** Cause ** Reference article: Causes and workarounds of UnicodeEncodeError (cp932, Shift-JIS encoding) when using Python3 on Windows

** Workaround ** Specify encoding to ʻutf-8` in the argument of Open function.

`scrape.py`


with open(os.path.join(save_response_dir, 'response_' + today + '.json'), mode='w', encoding='utf-8') as response_file:
        response_file.write(jsonstr)

6. This time get with API & save as json file

With a little tinkering, the final code looks like this:

`scrape.py`


import os
import datetime
import json

from time import sleep
from googleapiclient.discovery import build
                  
GOOGLE_API_KEY = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
CUSTOM_SEARCH_ENGINE_ID = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

DATA_DIR = 'data'


def makeDir(path):
    if not os.path.isdir(path):
        os.mkdir(path)


def getSearchResponse(keyword):
    today = datetime.datetime.today().strftime("%Y%m%d")
    timestamp = datetime.datetime.today().strftime("%Y/%m/%d %H:%M:%S")

    makeDir(DATA_DIR)

    service = build("customsearch", "v1", developerKey=GOOGLE_API_KEY)

    page_limit = 10
    start_index = 1
    response = []
    for n_page in range(0, page_limit):
        try:
            sleep(1)
            response.append(service.cse().list(
                q=keyword,
                cx=CUSTOM_SEARCH_ENGINE_ID,
                lr='lang_ja',
                num=10,
                start=start_index
            ).execute())
            start_index = response[n_page].get("queries").get("nextPage")[
                0].get("startIndex")
        except Exception as e:
            print(e)
            break

    #Save the response in json format
    save_response_dir = os.path.join(DATA_DIR, 'response')
    makeDir(save_response_dir)
    out = {'snapshot_ymd': today, 'snapshot_timestamp': timestamp, 'response': []}
    out['response'] = response
    jsonstr = json.dumps(out, ensure_ascii=False)
    with open(os.path.join(save_response_dir, 'response_' + today + '.json'), mode='w', encoding='utf-8') as response_file:
        response_file.write(jsonstr)


if __name__ == '__main__':

    target_keyword = 'Foreign Visitors in Japan Factor Research'

    getSearchResponse(target_keyword)

When I run it this time, a "response" folder is created under the "data" folder, and a json file is created under that!

7.6 Save the file obtained in 6 in formatting & tvs format

The code is below.

`prettier.py`


import os
import datetime
import json
import pandas as pd

DATA_DIR = 'data'


def makeDir(path):
    if not os.path.isdir(path):
        os.mkdir(path)


def makeSearchResults():
    today = datetime.datetime.today().strftime("%Y%m%d")

    response_filename = os.path.join(
        DATA_DIR, 'response', 'response_' + today + '.json')
    response_file = open(response_filename, 'r', encoding='utf-8')
    response_json = response_file.read()
    response_tmp = json.loads(response_json)
    ymd = response_tmp['snapshot_ymd']
    response = response_tmp['response']
    results = []
    cnt = 0
    for one_res in range(len(response)):
        if 'items' in response[one_res] and len(response[one_res]['items']) > 0:
            for i in range(len(response[one_res]['items'])):
                cnt += 1
                display_link = response[one_res]['items'][i]['displayLink']
                title = response[one_res]['items'][i]['title']
                link = response[one_res]['items'][i]['link']
                snippet = response[one_res]['items'][i]['snippet'].replace(
                    '\n', '')
                results.append({'ymd': ymd, 'no': cnt, 'display_link': display_link,
                                'title': title, 'link': link, 'snippet': snippet})
    save_results_dir = os.path.join(DATA_DIR, 'results')
    makeDir(save_results_dir)
    df_results = pd.DataFrame(results)
    df_results.to_csv(os.path.join(save_results_dir, 'results_' + ymd + '.tsv'), sep='\t',
                      index=False, columns=['ymd', 'no', 'display_link', 'title', 'link', 'snippet'])


if __name__ == '__main__':

    makeSearchResults()

When executed, it was organized in the order of date, number, site URL, title, article URL, and details!

If you open it in Excel, it looks like this ↓

Impressions

The article I referred to this time ([Get Google search results using Custom Search API](https://qiita.com/zak_y/items/42ca0f1ea14f7046108c#1-api%E3%82%AD%E3%] 83% BC% E3% 81% AE% E5% 8F% 96% E5% BE% 97)) was so nice and easy to understand that even beginners could easily implement it! I have to understand the meaning of the code well, but I'm happy to create a program that can be used in everyday life for the time being: satisfied: However, it seems that there are various restrictions on the Custom Search API if it is a free frame (Google Custom Search JSON API), so I will use it again in the future Sometimes you have to be careful.