Original article

background

When I looked up something, I wanted to pull the summary, title, and URL from Google search because the Google summary tells me about what I want to look up.

Purpose

Automate Google search, get search results, convert to CSV and output. By automating, you can reduce the time spent searching.

Requirement definition

Automatic browser operation
Get the title, summary, and URL from the search results
Output the obtained result to CSV

environment

Windows10Pro
Python3.7
Anaconda

Environment construction method

The program created this time is based on the 81 version of Chrome. Therefore, please set the Chrome version to 81 before running the program. ↓ This is easy to understand. How to check the version of Google Chrome

pip install selenium
pip install chromedriver_binary

Code actually written

import csv
import time  #Required to use sleep
from selenium import webdriver  #Automatically operate the web browser (python-m pip install selenium)
import chromedriver_binary  #Code to pass the path

def ranking(driver):
    i = 1  #Define loop number and page number

    title_list = []  #Prepare an empty list to store the title
    link_list = []  #Prepare an empty list to store the URL
    summary_list = []
    RelatedKeywords = []

    #Loop until the current page exceeds the specified maximum analysis page
    while i <= i_max:
        #Title and link are class="r"Is in
        class_group = driver.find_elements_by_class_name('r')
        class_group1 = driver.find_elements_by_class_name('s')
        class_group2 = driver.find_elements_by_class_name('nVcaUb')
        #For loop that extracts titles and links and adds them to the list
        for elem in class_group:
            title_list.append(elem.find_element_by_class_name('LC20lb').text)  #title(class="LC20lb")
            link_list.append(elem.find_element_by_tag_name('a').get_attribute('href'))  #Link(href attribute of a tag)

        for elem in class_group1:
            summary_list.append(elem.find_element_by_class_name('st').text)  #Link(href attribute of a tag)

        for elem in class_group2:
            RelatedKeywords.append(elem.text)  #Link(href attribute of a tag)

        #There is only one "Next", but I dare to search multiple by elements. An empty list means the last page.
        if driver.find_elements_by_id('pnnext') == []:
            i = i_max + 1
        else:
            #The URL of the next page is id="pnnext"Href attribute of
            next_page = driver.find_element_by_id('pnnext').get_attribute('href')
            driver.get(next_page)  #Move to the next page
            i = i + 1  #update i
            time.sleep(3)  #Wait 3 seconds

    return title_list, link_list, summary_list, RelatedKeywords  #Specify a list of titles and links as a return value



driver = webdriver.Chrome()  #Prepare chrome

#Open sample HTML
driver.get('https://www.google.com/')  #Open google
i_max = 5  #Define up to how many pages to analyze
search = driver.find_element_by_name('q')  #Search box in HTML(name='q')To specify
search.send_keys('Scraping automation"Python"')  #Send search word
search.submit()  #Perform a search
time.sleep(1.5)  # 1.Wait 5 seconds

#Run the ranking function to get the title and URL list
title, link, summary, RelatedKeywords = ranking(driver)


csv_list = [["Ranking", "title", "wrap up", "Link", "Related keywords"]]

for i in range(len(title)):
    add_list=[i+1,title[i],summary[i],link[i]]
    csv_list.append(add_list)

#Save title list to csv

with open('Search_word.csv','w',encoding="utf-8_sig") as f:
    writecsv = csv.writer(f, lineterminator='\n')
    writecsv.writerows(csv_list)

driver.quit()

Impressions

In total, I was able to create it in about 4 hours. I am satisfied because I can now automatically operate the browser using selenium and also get the title. You can use it when writing a blog!

What I can do

You can now read documents without dislike
Google search automation
Output to CSV

Task

Review how to operate the list
Make this possible by linking Google Colab and spreadsheets
I want to be able to search from the spreadsheet as shown in the last two reference URLs.
If the HTML Class name changes, it will not be possible again, so I want to be able to flexibly acquire data even if the Class name changes.

References

I tried to extract the Google search title and URL list with Python
Locating Elements
A story about having a hard time opening a file other than CP932 (Shift-JIS) encoded on Windows
I want to be able to search from the spreadsheet as shown in the last two reference URLs.
[Python] What to do if you can't scrape Google search results [with commentary]
[[Python]] csv output from Google search![Easy](Original article)](https://acfoapon.hatenablog.com/entry/2020/04/16/120000?_ga=2.113898116.2051319045.1587005144-1011829840.1582693178)

Next implementation

Prepare a CSV list and create a program to read search words from it. Combine with the program created this time to make it easy to search with multiple search words. It also enables data to be stored using spreadsheets.

Csv output from Google search with [Python]! 【Easy】