[Python + Selenium] Tips for scraping

About this article

I used to scrape in practice, so this is a memo of the trick I input at that time.

tool

Python3(3.6.2) Selenium Chrome driver(85.0.4183.87)

1. Operate js (display hidden elements)

You can operate Javascript from Selenium by using the execute_script method.

For example, you can change the text color by manipulating the js setAttribute method as shown below.

python


from selenium import webdriver
driver = webdriver.Chrome()

driver.get("https://www.hogefuga")

element = driver.find_element_by_xpath("//div[@class='fuga']/span")
driver.execute_script("arguments[0].setAttribute('style','color: red;')", element)

You can also use this method to display elements that have display: none ;. For example, you can display the hidden element by deleting the class name to which display: none; is applied from the hidden element.

python


#display in close class:none;If is applied
element = driver.find_element_by_xpath("//div[@class='hoge close']/span")
driver.execute_script("arguments[0].setAttribute('class','hoge')", element)

2. Get the element in the iframe

You need ** switch_to_frame () ** to get the elements in ifame.

python


driver.switch_to_frame(driver.find_element_by_xpath("//div[@class='hoge']/iframe"))

Now you can get the elements in ifame available. On the other hand, elements outside the iframe cannot be retrieved. Therefore, if you want to get the original element, you need to switch so that you can get the original element again by ** switch_to_default_content () **.

python


driver.switch_to_default_content()

3. Switch the operation target of Selenium to another window

When scraping by operating selenium, another window may open after clicking the link. If you want to perform some operation on another window, use ** switch_to_window () ** to switch the operation target to another window.

python


#Open another window
driver.find_element_by_xpath("//div[@class='hoge']/a").click()

#Windows that are open from the beginning
window_before = driver.window_handles[0]
#Newly open window
window_after = driver.window_handles[1]

#Switch the operation target of selenium to the newly displayed window
driver.switch_to_window(window_after)

#Switch the operation target of selenium to the window that is open from the beginning
driver.switch_to_window(window_before)

4. Click the radio button

python


#Get radio button element
element = driver.find_element_by_id(“fugafuga”)

driver.execute_script("arguments[0].click();", element)

5. Download the file from the web page with Headless Chrome

From a security point of view, Headless Chrome doesn't seem to implement the file download function by default. Therefore, it seems necessary to set to allow file download by post communication.

python


from selenium import webdriver

DOWNLOAD_URL = "https:www.hogefuga/file/download"
download_dir = "/home/download"  #Location of downloaded files

def enable_download(driver, download_dir):
    driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
    params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': download_dir}}
    driver.execute("send_command", params)

def setting_chrome_options():
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument('--no-sandbox')
    return chrome_options;

driver = webdriver.Chrome(executable_path="/usr/local/bin/chromedriver",options=setting_chrome_options())
enable_download(driver, download_dir)
driver.get(DOWNLOAD_URL)

Recommended Posts

[Python + Selenium] Tips for scraping
Selenium + WebDriver (Chrome) + Python | Building environment for scraping
Scraping with selenium in Python
Scraping with Selenium + Python Part 1
Scraping with Selenium in Python
Scraping with Selenium + Python Part 2
~ Tips for beginners to Python ③ ~
python tips
python tips
[Scraping] Python scraping
Python Tips
Python tips
Scraping with Selenium in Python (Basic)
Scraping with Python, Selenium and Chromedriver
[TouchDesigner] Tips for for statements using python
Tips for calling Python from C
Beginners use Python for web scraping (4) ―― 1
2016-10-30 else for Python3> for:
Python Conda Tips
python [for myself]
Python scraping notes
Scraping with selenium
[Python / Selenium] XPath
Python Scraping get_ranker_categories
Scraping with selenium ~ 2 ~
Scraping with Python
WEB scraping with Python (for personal notes)
Scraping with Python
Python debugging tips
Tips for dealing with binaries in Python
Python click tips
Unexpectedly (?) Python tips
Scraping with Selenium
Tips for using python + caffe with TSUBAME
Python: Scraping Part 1
Tips for making small tools in python
Practice web scraping with Python and Selenium
Scraping using Python
Preparation for scraping with python [Chocolate flavor]
Overwrite download file for python selenium Chrome
[For beginners] Try web scraping with Python
Python: Scraping Part 2
I tried web scraping using python and selenium
~ Tips for Python beginners from Pythonista with love ① ~
Scraping dynamically loaded TV program listings [Python] [Selenium]
Tips for hitting the ATND API in Python
[Python / Chrome] Basic settings and operations for scraping
~ Tips for Python beginners from Pythonista with love ② ~
About Python for loops
Scraping with Python (preparation)
Summary about Python scraping
Python and numpy tips
Python basics ② for statement
UnicodeEncodeError:'cp932' during python scraping
Basics of Python scraping basics
Scraping with Python + PhantomJS
About Python, for ~ (range)
python textbook for beginners
Python Tips (my memo)
Refactoring tools for Python
ScreenShot with Selenium (Python)