[Ruby] 5 errors that tend to occur when scraping with Selenium and how to deal with them

Here are 5 errors that often occur when using the tool "Selenium" </ b> that is useful for web scraping.

Introduction

Selenium is a gem that is useful for web scraping with complex operations.

A gem that can be used for web scraping in Ruby </ b>

--Nokogiri (basic) --Mechanize (Easy) --Selenium (You can do various things by operating the browser)

In Nokogiri and Mechanize, HTML tags and CSS are specified to get the content of the page. With Selenium, you can open a browser programmatically and operate it in a simulated manner, so you can perform complicated processing.

--If you need to log in --If you are drawing a page with JavaScript --If you want to enter data

This is useful when you cannot get the necessary information using only HTML tags and CSS selectors.

How to use Selenium

Various people have summarized how to use Selenium, so please have a look there.

[Ruby] Web scraping with Chrome on selenium

Selenium cheat sheet [Ruby]

#Install the tools needed to use Selenium
require 'selenium-webdriver'

#Start Selenium
driver = Selenium::WebDriver.for :chrome

5 certain errors when using Selenium

1. The version of the tool that runs the browser and Selenium is different

Selenium::WebDriver::Error::SessionNotCreatedError (session not created: This version of ChromeDriver only supports Chrome version 75)

In Selenium, you specify the browser type and move the mouse programmatically.

#Start selenium
driver = Selenium::WebDriver.for :chrome

If you want to use Chrome, install the same version of Chrome driver as Chrome. You need to have the browser you want to use, such as Firefox or Chrome, installed in your development environment.

How to check the version of Google Chrome </ b> Check the version from Google Chrome Settings> Help> About Google Chrome 20190515051326.png

How to install Chromedriver </ b> [Ruby] Web scraping with Chrome on selenium

If you want to deploy and use it on Heroku, you need to install Google Chrome and Chrome driver on Heroku as well. You also need to add an option to run Chrome headless. Keep Chrome and Chrome driver on Heroku

[Free] Chrome headless + selenium regularly run on heroku

Take a screenshot of a web page using headless chrome on Heroku's rails app

2. The web page could not be opened

Selenium::WebDriver::Error::InvalidArgumentError (invalid argument: 'url' must be a string)

Bad example: pass as a variable

#Open URL
@url = 'https://www...'
driver.get(@url)
driver.get("#{@url)")

Is it useless if the inside of () is a ruby variable? I was angry to make it a character string.

OK example: Write the character string as it is

#Open URL
driver.get('https://www...')

3. Could not get the element in the page well

Selenium::WebDriver::Error::NoSuchElementError: no such element: Unable to locate element: {"method":"id","selector":"#entryBtn"}

Error that occurs when there is no element corresponding to HTML / CSS. Check if you can specify the class name etc. properly.

For nokogiri: Specify the element with CSS selector

Specify the element you want to get with the CSS selector.

#Install the tools needed to use nokogiri
require 'nokogiri'
require 'open-uri'

#Get page elements using Nokogiri
html = Nokogiri::HTML(open('https://www.google.co.jp/'))
logo = html.css('#hplogo')

For selenium: Specify the element by element type + element name (HTML tag, CSS class name, etc.)

Specify the element you want to get by element type + element name.

#Install the tools needed to use selenium
require 'selenium-webdriver'

#Start Selenium and get page elements
driver = Selenium::WebDriver.for :chrome
driver.find_element(:id, 'hplogo') 

With selenium, you don't need "#".

4. You are working on a Selenium element that is not on the page

Selenium::WebDriver::Error::StaleElementReferenceError (stale element reference: element is not attached to the page document)

An error that occurred when trying to operate on an element that was on the previous page by browser back.

Bad example: Variables are empty after the second loop

#Install the tools needed to use selenium
require 'selenium-webdriver'

#Start Selenium
driver = Selenium::WebDriver.for :chrome

#Get the URL of the event details page from the event list page
events = driver.find_elements(:class, 'eventItem')

#Go to the event details page
for i in 0..events.size()-1
  #Click the button to the event details page
  events.find_element(:class, 'entryBtn').click()
 #→ An error occurs in the second and subsequent loop processing

  #Go back to the previous page
 driver.navigate.back
end

OK example: Reacquire the element in the loop process

#Install the tools needed to use selenium
require 'selenium-webdriver'

#Start Selenium
driver = Selenium::WebDriver.for :chrome

#Get the URL of the event details page from the event list page
events = driver.find_elements(:class, 'eventItem')

#Go to the event details page
for i in 0..events.size()-1
  #Since the driver has disappeared during the second and subsequent loop processing, specify the driver again.
  events_in_loop = driver.find_elements(:class, 'prfItem')

  #Click the button to the event details page
  events_in_loop[i].find_element(:class, 'entryBtn').click()

  #Go back to the previous page
 driver.navigate.back
end

If you are in a loop, the driver will not be valid, so you need to get the driver again in the loop.

reference get StaleElementReferenceException error while using driver.navigate().back() in a loop in selenium

5. I can't enter Ruby commands

It is an environment construction error that feels like "finished".

 `require': incompatible library version - /Users/cathy/Desktop/work/vagrant/Test/vendor/bundle/ruby/2.5.0/gems/pg-0.19.0/lib/pg_ext.bundle (LoadError)

When I uninstalled gem, I got another error and couldn't get out ...

/Users/cathy/.rbenv/versions/2.5.1/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require': cannot load such file -- rubygems/core_ext/kernel_warn (LoadError)

It seems that the cause is that there is a description of require'selenium-driver' even though the Selenium gem has disappeared?

gem file cannot load such file

I couldn't solve it even if I tried various things, so after all, I managed to get it to work by raising the version of Ruby.

I stumbled when I upgraded Ruby

Recommended Posts