I wrote it in order to achieve the requirement that all 55 pages and 24 product images per page must be acquired. Selenium is really convenient. If you know the product name, product URL, and product image SRC on a text basis, it's OK for the time being, so it's very concise.
seleniumer.py
import os, re
import time
from selenium import webdriver
DRIVER_PATH = os.path.join(os.path.dirname(__file__), 'chromedriver')
browser = webdriver.Chrome(DRIVER_PATH)
url = 'https://www.XXXXXXXX'
browser.get(url)
for i in range(56):
try:
img_list = []
urls_list = []
name_list = []
for i in browser.find_elements_by_xpath('//*[@id="find-results"]/div/div/div/a[1]/img[1]'):
imgs = i.get_attribute('src')
img_list.append(imgs)
for a in browser.find_elements_by_xpath('//*[@id="find-results"]/div/div/div/a[1]'):
urls = a.get_attribute('href')
urls_list.append(urls)
for t in browser.find_elements_by_xpath('//*[@id="find-results"]/div/div/div/a/div/span[1]/span'):
name = t.text
name_list.append(name)
for img_src, urls_href, name_title in zip(img_list, urls_list, name_list):
print (name_title, urls_href, img_src, "\n+++++++++++++++++++++++++++++++++++++++++++++++++++")
link_elem = browser.find_element_by_class_name('control-page-next-button')
link_elem.click()
time.sleep(3)
except:
print ('not found!')
browser.close()
If you want to find out the xpath used in browser.find_elements_by_xpath, you can paste it from COPY in Chrome. I was impressed with the extremely useful functions.
It seems that the find_elements () method has to be turned around in for.
Recommended Posts