[Rails] How to implement scraping

Now that you've learned how to implement scraping, I'll write it as a learning output.

After reading this article, what is scraping? What should I do to implement it? I can understand.

What is scraping? To learn information from a website and process it to generate new information. For example, visiting various restaurant sites and creating a price list.

I want to know more details! If you think that, please search on Google.

Now, I will write how to implement scraping.

How to implement scraping

1 Install the gemfile "mechanize"

gem 'mechanize'

Then type bundle install in the terminal

2 Create an instance of the Mechanize class

agent = Mechanize.new #Create an instance of the Mechanize class and assign it to the variable agent

3 Get website HTML information Use the instance method "get" of the Mechanize class to get the HTML of the website you want to scrape.

page = agent.get("https://www.google.com/?hl=ja")

4 Use the search method to search for HTML elements The search method is used for the object that contains the page information obtained by the get method. As a result, the content of the specified HTML element can be searched from the acquired HTML information of the website. Even if there is only one corresponding HTML tag element, the return value will be returned in the form of an array.

 agent = Mechanize.new
  page = agent.get("https://www.google.com/?hl=ja")
  elements = page.search('h1')

↑ The information of h1 element in https://www.google.com/?hl=ja is acquired.

5 inner_text method If you want to get the text of the HTML information obtained by the search method, use the inner_text method.

agent = Mechanize.new
page = agent.get("URL of the website you want to scrape")
elements = page.search('h2 a') #Search for a element under h2 element

elements.each do |ele|
  puts ele.inner_text
end

6 get_attribute method If you want to get the value of HTML attribute, use get_attribute method. For example, the HTML of the a tag element has an attribute "href" whose value is the URL of the link destination. You can get the value of the attribute specified by the argument by writing get_attribute (attribute).

agent = Mechanize.new
page = agent.get("URL of the website you want to scrape")
elements = page.search('h2 a') #Search for a element under h2 element

elements.each do |ele|
  puts ele.get_attribute('href') # puts ele[:href]May be
end

Summary of scraping

● Create an instance of the Mechanize class ● Get the HTML information of the website with the instance method .get (URL of the website for which you want to get information) of the Mechanize class. ● Learn by specifying the tag element with the desired data with the search method ● Learn the information you want by using the inner_text and get_attribute methods for the HTML information of the acquired tag element.

Recommended Posts

[Rails] How to implement scraping
[Rails] How to implement star rating
How to implement search functionality in Rails
How to write Rails
How to uninstall Rails
How to implement ranking functionality in Rails
How to implement image posting using rails
How to implement a like feature in Rails
[Rails] How to easily implement numbers with pull-down
[rails] How to post images
[Rails] How to use enum
[Rails] How to install devise
[Rails] How to use enum
How to read rails routes
How to use rails join
How to write Rails validation
[Rails] How to use validation
[Rails] How to use "kaminari"
[Rails] How to make seed
How to write Rails routing
[Rails] How to install simple_calendar
[Java] How to implement multithreading
[Rails] How to install reCAPTCHA
[Rails] How to use Scope
How to implement login request processing (Rails / for beginners)
How to implement a like feature in Ajax in Rails
[Rails, JS] How to implement asynchronous display of comments
Rails learning How to implement search function using ActiveModel
[Rails] How to use gem "devise"
How to deploy jQuery on Rails
[Rails] How to install Font Awesome
[Rails] How to use devise (Note)
[Rails] How to use flash messages
[rails] How to display db information
[Rails] How to write in Japanese
[Rails] How to prevent screen transition
How to use Ruby on Rails
How to deploy Bootstrap on Rails
[Rails] How to speed up docker-compose
[Rails] How to add new pages
Rails on Tiles (how to write)
[Rails] How to write exception handling?
[Rails] How to install ImageMagick (RMajick)
[Rails] How to install Font Awesome
[Rails] How to use Active Storage
How to introduce jQuery in Rails 6
How to return Rails API mode to Rails
How to get along with Rails
[Introduction to Rails] How to use render
How to install Swiper in Rails
How to implement date calculation in Java
How to implement Kalman filter in Java
How to use custom helpers in rails
[Rails] How to convert from erb to haml
[Rails] How to upload images using Carrierwave
[Rails] How to use rails console with docker
How to insert a video in Rails
[Rails] How to use ActiveRecord :: Bitemporal (BiTemporalDataModel)
[Rails] How to use the map method
How to deploy
How to use MySQL in Rails tutorial