[Nokogiri] Let's handle RSS news in Ruby!

Hello! This time, I will analyze RSS news with Nokogiri and summarize how to handle it with Ruby.

In this summary, we will use a gem called Nokogiri to handle RSS news in Ruby. If you can handle RSS news in Ruby, you can make your own curated media, so let's do it.

What to make this time

I will automatically fetch the title part of the news from RSS that distributes the following game information and put it in the array. https://automaton-media.com/feed/

For the time being, I will make a simple one and put it together. (If you feel like it, let's write a big article like Make curation media with Rails!)

A brief description of the XML data

Before dealing with RSS news data, let's take a brief look at XML.

ゲームRSSオリジナル説明用.png

XML declaration

The part written as ``` <? Xml version =" 1.0 "~` `` at the beginning indicates that this file is an XML file, and it must be described at the beginning.

definition of channel

The block following the XML declaration that begins with ``` ` `` defines the channel name for this RSS.

image definition

The channel logo etc. are set.

Install Nokogiri

First, let's install Nokogiri.

gem install nokogiri

For use with Rails, add nokogiri to your Gemfile.

gem 'nokogiri'

After adding it to your Gemfile, bundle install it.

bundle install

Program creation

Now that Nokogiri is installed, let's actually create a program.

Require the library to be used

First, create a file called nokogiri.rb and add the following two lines at the beginning.

nokogiri.rb


require 'open-uri' #I want to use the open method that can get the URL data by passing the URL as an argument, so load it.
require 'nokogiri' #The data fetched by the open method is read for handling by nokogiri.

Read news articles with the open method

nokogiri.rb


require 'open-uri'
require 'nokogiri'

url = 'https://automaton-media.com/feed/' #Set the news to be read this time.

charset = nil #Set it to nil and reset it so that the loaded news will not be garbled.
titles = open(url) do |file| #Get the data with the open method and pass it to the block for manipulation.
  charset = file.charset #Set the charset of the read file in charset.
end

Let's search the news fetched by the open method with Nokogiri

nokogiri.rb


require 'open-uri'
require 'nokogiri'

url = 'https://automaton-media.com/feed/'

charset = nil
titles = open(url) do |file|
  charset = file.charset
  doc = Nokogiri::XML(file) #Make the file fetched by the open method an object of Nokogiri.
  channel = doc.at_xpath('//channel') #Gets the channel part in the file.
  title = channel.xpath('//title') #Get all titles in the channel.
  title.map { |title| title.text } #Collect only the text part from the NodeSet of title into an array.
end

puts titles #Let's output the title.

** Nokogiri method description **

--at_xpath Returns the first element that matches the specified xpath. (The element is called Node) --xpath Returns all elements that match the specified xpath. (The element is called NodeSet)

Let's run

Let's execute the created file.

ruby nokogiri.rb

Could you output the news titles arranged in an array as shown below?

スクリーンショット 2020-05-24 4.39.09.png

Frequently used Nokogiri search methods

Nokogiri's search method can be googled each time according to the requirements of the news you want to extract, but the methods that are often used are summarized below.

at

doc.at('//title') #Returns the first search hit Node.

at_xpath

doc.at_xpath('//title') #Search by xpath and return the first hit Node.

xpath

doc.xpath('//title') #Returns a NodeSet that hits a search in xpath.

at_css

doc.at_css('title') #Search by css and return the first hit Node.

css

doc.css('title') #Returns a NodeSet that hits the search in css.

Impressions

This time, I collected RSS news titles in an array with Ruby, but if possible, I think that I can put it in the DB, notify Slack and LINE, and so on. It may be interesting to create a summary site for yourself.

Recommended Posts

[Nokogiri] Let's handle RSS news in Ruby!
Class in Ruby
Heavy in Ruby! ??
Scraping yahoo news with [Ruby + Nokogiri] → Save CSV
How to handle TSV files and CSV files in Ruby
About eval in Ruby
Output triangle in Ruby
Variable type in ruby
Fast popcount in Ruby