[Ruby] [Ruby] 5 errors that often occur when scraping with Selenium and how to deal with them

4 minute read

Here are 5 errors you may encounter when using the “Selenium” tool that is useful for web scraping.

Introduction

Selenium is a gem useful for web scraping with complex operations.

A gem that can be used for web scraping with Ruby

  • Nokogiri (basic)
  • Mechanize
  • Selenium (You can do various things with browser operation)

In Nokogiri and Mechanize, specify the HTML tags and CSS to get the page contents. With Selenium, you can open a browser programmatically and operate in a pseudo manner, so you can perform complicated processing.

  • If you need to log in
  • When drawing a page with JavaScript
  • If you want to enter data

It is convenient to use when you cannot get the necessary information only with HTML tags and CSS selectors.

How to use Selenium

Since various people have summarized how to use Selenium, please see that.

Web scraping using Chrome with [Ruby] selenium

Selenium cheat sheet [Ruby]

# Install the tools required to use Selenium
require'selenium-webdriver'

# Start Selenium
driver = Selenium::WebDriver.for :chrome

5 certain errors when using # Selenium

1. The version of the tool that runs the browser and Selenium is different

Selenium::WebDriver::Error::SessionNotCreatedError (session not created: This version of ChromeDriver only supports Chrome version 75)

In Selenium, specify the browser type and move the mouse programmatically.

# start selenium
driver = Selenium::WebDriver.for :chrome

If you want to use Chrome, install the same version of Chromedriver as Chrome. The browser you want to use, such as Firefox or Chrome, must be installed in the development environment.

How to check Google Chrome version Check the version from Google Chrome settings> Help> About Google Chrome 20190515051326.png

How to install Chromedriver Web scraping using Chrome with [Ruby] selenium

To deploy and use on Heroku, you also need to install Google Chrome and Chrome driver on Heroku. You also need to add an option to run Chrome headless. Put Chrome and Chrome driver on Heroku

[Free] Periodically run Chrome headless + selenium on heroku

Take a screenshot of the web page using headless chrome with Heroku rails app

2. I couldn’t open the web page

Selenium::WebDriver::Error::InvalidArgumentError (invalid argument:'url' must be a string)

Bad example: pass by variable

# Open URL
@url ='https://www...'
driver.get(@url)
driver.get("#{@url)")

Is it okay if () is a ruby variable? I was angry that it should be a string.

OK example: write the string as is

# Open URL
driver.get('https://www...')

3. I couldn’t get the elements in the page well

Selenium::WebDriver::Error::NoSuchElementError: no such element: Unable to locate element: {"method":"id","selector":"#entryBtn"}

Error when there is no element corresponding to HTML/CSS. Check if the class name etc. can be specified properly.

nokogiri: Specify the element with CSS selector

Specify the element you want to get with the CSS selector.

# Install the tools required to use nokogiri
require'nokogiri'
require'open-uri'

Get page elements using #Nokogiri
html = Nokogiri::HTML(open('https://www.google.co.jp/'))
logo = html.css('#hplogo')

For selenium: Specify the element by element type + element name (HTML tag, CSS class name, etc.)

Specify the element you want to get with the element type + element name.

# Install the tools required to use selenium
require'selenium-webdriver'

# Start Selenium and get the page element
driver = Selenium::WebDriver.for :chrome
driver.find_element(:id,'hplogo')

With selenium, you don’t need “#”.

4. You are operating on a Selenium element that is not on the page

Selenium::WebDriver::Error::StaleElementReferenceError (stale element reference: element is not attached to the page document)

An error that occurred when trying to operate on the element that was on the previous page by doing a browser back.

Bad Example: Variable is empty after the second loop

# Install the tools required to use selenium
require'selenium-webdriver'

# Start Selenium
driver = Selenium::WebDriver.for :chrome

# Get the URL of the event details page from the event list page
events = driver.find_elements(:class,'eventItem')

# Go to the event details page
for i in 0..events.size()-1
  # Click the button to go to the event details page
  events.find_element(:class,'entryBtn').click()
# → An error occurs in the second and subsequent loop processing

  # Go back to previous page
Driver.navigate.back
end

OK example: get elements back in the loop

# Install the tools required to use selenium
require'selenium-webdriver'

# Start Selenium
driver = Selenium::WebDriver.for :chrome

# Get the URL of the event details page from the event list page
events = driver.find_elements(:class,'eventItem')

# Go to the event details page
for i in 0..events.size()-1
  # The driver is lost during the second and subsequent loops, so specify the driver again.
  events_in_loop = driver.find_elements(:class,'prfItem')

  # Click the button to go to the event details page
  events_in_loop[i].find_element(:class,'entryBtn').click()

  # Go back to previous page
Driver.navigate.back
end

If you are looping, the driver will no longer be valid, so you need to get the driver again in the loop.

reference get StaleElementReferenceException error while using driver.navigate().back() in a loop in selenium

5. Ruby commands cannot be entered

It’s an environment construction error that says “it’s over”.

 `require': incompatible library version-/Users/cathy/Desktop/work/vagrant/Test/vendor/bundle/ruby/2.5.0/gems/pg-0.19.0/lib/pg_ext.bundle (LoadError)

When I uninstall gem, I get another error and get stuck…

/Users/cathy/.rbenv/versions/2.5.1/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require': cannot load such file --rubygems/core_ext/kernel_warn (LoadError) It seems that there is a description of require’selenium-driver’ even though the gem of Selenium is gone?

gem file cannot load such file

I couldn’t solve it even if I tried various things, so after all, I managed to get it working by raising the Ruby version.

Place I stumbled when I upgraded Ruby