When scraping with Selenium + Headless Chrome, I came across a site that gives a NoSuchElementException error as soon as I set it to headless, even though I can get information in head mode. There were few articles in Japanese about workarounds, so I will post them.
-Scraping is possible in head mode. -A NoSuchElementException occurred as soon as the headless option was added.
It seems that the element has not been obtained, so I checked the source of the site with driver.page_source.
scraping.py
driver.page_source
The returned HTML has the words "Access Denied", and it seems that access from headless is denied.
<html><head>
webapp_1 | <title>Access Denied</title>
webapp_1 | </head><body>
webapp_1 | <h1>Access Denied</h1>
webapp_1 |
webapp_1 | You don't have permission to access "http://www.xxxxxxx/" on this server.<p>
Upon examination, the chrome driver had a user_agent option that could be pretended to be accessed from a browser. By adding this to the option of chromedrivere, you can get the element safely.
scraping.py
options = webdriver.ChromeOptions()
options.binary_location = '/usr/bin/google-chrome'
options.add_argument('--no-sandbox')
options.add_argument('--headless')
options.add_argument('--disable-gpu')
options.add_argument('--lang=ja-JP')
options.add_argument(f'user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36') #add to
that's all
Recommended Posts