[JAVA] Execute_script method used for web scraping by Selenium


Here is a summary of the execute_script method that is useful when doing web scraping with Selenium. execute_script executes the operation described in JavaScript on the Web page.

table of contents

  1. find_element if execute_script is not used
  2. execute_script
  3. Make it a method and make it easier to use
  4. Summary

1. If execute_script is not used, find_element, find_elements

When web scraping is performed with Selenium, page processing is usually performed by specifying the elements with the find_element and find_elements methods and executing the click method.

When using find_element and find_elements
#Click action(Example: submit_A class called btn,"Send"Input element with value)
@driver.find_element(:class,'submit_btn').click #Specified by class
@driver.find_element(:css,"input[value='Send']").click #Specified by CSS

#Enter value

#Find out the number of elements
@driver.find_elements(:class, '.btn').size #Check the number of buttons.
  1. execute_script

The find_element (s) above is a bit annoying because you have to run ruby every time to see if you can specify the element correctly. So, let's implement the same operation with execute_script as follows.

When using execute_script
#Click action
@driver.execute_script(%{document.querySelector('.submit_btn').click();}) #Specified by class
@driver.execute_script(%{document.querySelector("input[value='Send']").click();}) #Specified by CSS

#Enter value
@driver.execute_script(%{document.querySelector("[name='favorite_food']").value = 'ramen';})

#Find out the number of elements
@driver.execute_script(%{return document.querySelectorAll('.btn').length;}) #Check the number of buttons

Write the JavaScript code you want to execute directly in the argument of execute_script. By doing this, you can develop while executing Javascript in a web browser. In the case of chrome, you can execute it just by typing JavaScript in the Console tab of the developer tools. This drastically reduces the number of round trips between editors and terminals, enabling efficient development.

3. Make it a method and make it easier to use

It is also desirable to refactor with method partitioning to reduce redundancy and readability. In this case, pass the css selector and the value you want to send as arguments.

When a method is defined for processing by JavaScript

#Click action method
def query_click(css_selector)

#Method to input value
def value_input(css_selector,value)
  @driver.execute_script(%{document.querySelector("#{css_selector}").value = "#{value}";})

#Method to check the number of elements
def query_count(css_selector)
  @driver.execute_script(%{return document.querySelectorAll("#{css_selector}").length;})

query_click('.submit_btn')         #Specify by class and click
query_click("input[value='Send']") #Specify with css and click

value_input("[name='favorite-food']",'ramen') #Enter value

query_count('.btn') #Find out the number of buttons

4. Summary

I summarized how to make Selenium execute JavaScript by execute_script. Since JavaScript cannot process across pages, it may be better to use find_element when you want to execute repeated processing by transitioning pages. If you can use both of them, I think that web scraping will improve. Please take advantage of it.

Recommended Posts

Execute_script method used for web scraping by Selenium
[Ruby] Get Qiita trend articles by web scraping
Get the anime name for this term by scraping
CentOS Stream8: Setting the encryption method used by sshd