-Although it's late, Raspberry Pi also tried scraping the card company. -Since I tried scraping with Selenium using Raspeye Zero, many timeouts occurred. -We have made it possible to implement scraping as lightly as possible. -Manage passwords at your own risk. ..
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
import time
from bs4 import BeautifulSoup as BS
import re
import requests
options = Options()
options.add_argument("no-sandbox")
options.add_argument("--disable-extensions")
options.add_argument("--headless")
options.add_argument('--disable-gpu')
options.add_argument('--ignore-certificate-errors')
options.add_argument('--allow-running-insecure-content')
options.add_argument('--disable-web-security')
options.add_argument('--disable-desktop-notifications')
options.add_argument("--disable-extensions")
options.add_argument('--lang=ja')
options.add_argument('--blink-settings=imagesEnabled=false')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--proxy-server="direct://"')
options.add_argument('--proxy-bypass-list=*')
options.add_argument('--start-maximized')
driver = webdriver.Chrome(options=options)
driver.get (http:///~~~)
Also, in order to reduce the reading range, it is possible to read up to the Class immediately before scraping. Described with maximum timeout
wait = WebDriverWait(driver, 300);
element=wait.until(EC.presence_of_element_located((By.CLASS_NAME,"fluid")));
After that, read and process with BeautifulSoup
res = driver.page_source.encode('utf-8')
print("loading")
soup=BS(res,"html.parser")
Now I've managed to avoid the timeout.
Recommended Posts