When I looked up something, I wanted to pull the summary, title, and URL from Google search because the Google summary tells me about what I want to look up.
Automate Google search, get search results, convert to CSV and output. By automating, you can reduce the time spent searching.
Windows10Pro
Python3.7
Anaconda
The program created this time is based on the 81 version of Chrome. Therefore, please set the Chrome version to 81 before running the program. ↓ This is easy to understand. How to check the version of Google Chrome
pip install selenium
pip install chromedriver_binary
import csv
import time #Required to use sleep
from selenium import webdriver #Automatically operate the web browser (python-m pip install selenium)
import chromedriver_binary #Code to pass the path
def ranking(driver):
i = 1 #Define loop number and page number
title_list = [] #Prepare an empty list to store the title
link_list = [] #Prepare an empty list to store the URL
summary_list = []
RelatedKeywords = []
#Loop until the current page exceeds the specified maximum analysis page
while i <= i_max:
#Title and link are class="r"Is in
class_group = driver.find_elements_by_class_name('r')
class_group1 = driver.find_elements_by_class_name('s')
class_group2 = driver.find_elements_by_class_name('nVcaUb')
#For loop that extracts titles and links and adds them to the list
for elem in class_group:
title_list.append(elem.find_element_by_class_name('LC20lb').text) #title(class="LC20lb")
link_list.append(elem.find_element_by_tag_name('a').get_attribute('href')) #Link(href attribute of a tag)
for elem in class_group1:
summary_list.append(elem.find_element_by_class_name('st').text) #Link(href attribute of a tag)
for elem in class_group2:
RelatedKeywords.append(elem.text) #Link(href attribute of a tag)
#There is only one "Next", but I dare to search multiple by elements. An empty list means the last page.
if driver.find_elements_by_id('pnnext') == []:
i = i_max + 1
else:
#The URL of the next page is id="pnnext"Href attribute of
next_page = driver.find_element_by_id('pnnext').get_attribute('href')
driver.get(next_page) #Move to the next page
i = i + 1 #update i
time.sleep(3) #Wait 3 seconds
return title_list, link_list, summary_list, RelatedKeywords #Specify a list of titles and links as a return value
driver = webdriver.Chrome() #Prepare chrome
#Open sample HTML
driver.get('https://www.google.com/') #Open google
i_max = 5 #Define up to how many pages to analyze
search = driver.find_element_by_name('q') #Search box in HTML(name='q')To specify
search.send_keys('Scraping automation"Python"') #Send search word
search.submit() #Perform a search
time.sleep(1.5) # 1.Wait 5 seconds
#Run the ranking function to get the title and URL list
title, link, summary, RelatedKeywords = ranking(driver)
csv_list = [["Ranking", "title", "wrap up", "Link", "Related keywords"]]
for i in range(len(title)):
add_list=[i+1,title[i],summary[i],link[i]]
csv_list.append(add_list)
#Save title list to csv
with open('Search_word.csv','w',encoding="utf-8_sig") as f:
writecsv = csv.writer(f, lineterminator='\n')
writecsv.writerows(csv_list)
driver.quit()
In total, I was able to create it in about 4 hours. I am satisfied because I can now automatically operate the browser using selenium and also get the title. You can use it when writing a blog!
Prepare a CSV list and create a program to read search words from it. Combine with the program created this time to make it easy to search with multiple search words. It also enables data to be stored using spreadsheets.
Recommended Posts