If it is a site written in JS etc., you may not be able to scrape it with Beautiful Soup. Selenium can be used in such cases.
(For Mac)
On the Download Page (https://chromedriver.chromium.org/downloads),
From the following part, download the chrome driver that matches the version examined above. (Select the OS at the page link destination.)
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
url="~~~~~~"#URL you want to open here
options = Options()
options.add_argument('--headless') #Enable headless mode
Driver_path="~~~~~~" #Specify the location where you put the downloaded chrome driver
driver = webdriver.Chrome(Driver_path,options=options)
driver.get(url)
time.sleep(2)
html = driver.page_source.encode('utf-8')
soup = BeautifulSoup(html, 'lxml')
#After this, you can use it normally according to the grammar of Beautiful Soup.
By adding option, it is prevented that the page is opened every time driver.get is executed. (This will speed up the process a bit.)
Three settings to make for stable operation of Selenium (also supports Headless mode)
Recommended Posts