[Ruby] Get Qiita trend articles by web scraping

I check Qiita's trend articles every day. And I try to read any titles that interest me. But when I think about it, I feel like I don't have to go to the site.

All you need is the article title and the URL of the article. So, I tried to get these two with one command.

procedure

  1. Write scraping code
  2. Run in terminal

Write scraping code

qiita.rb


require 'open-uri'
require 'nokogiri'

url = 'https://qiita.com/'
html = open(url) { |f| f.read }
doc = Nokogiri::HTML.parse(html, nil, 'utf-8')

articles = doc.xpath("//a[@class='css-qrra2n']")
articles.each do |article|
    print "\n" #Insert a blank line for each article to make it easier to read
    puts article.text #Article title
    puts article.attribute('href').value #Article URL
end

Run in terminal

$ ruby qiita.rb
Article title 1
https://qiita.com/user_name1/items/aaaaaaaa

Article title 2
https://qiita.com/user_name2/items/bbbbbbbb

Article title 3
https://qiita.com/user_name3/items/cccccccc

(Omitted below)

Afterword

For the time being, I made something to output on the terminal. I tried web scraping for the first time, but I was surprised that it was surprisingly easy to do. There are other sites that I check every day, so I would like to try them as well.

reference

-Getting article information in ruby ​​scraping -Create a scraping source in 20 minutes using Ruby -Required for crawler creation! XPATH notation summary

Recommended Posts

[Ruby] Get Qiita trend articles by web scraping
Get data of all Premier League players by scraping with Ruby (nokogiri)
Get a list of Qiita articles for a specific user with Ruby + Qiita API v2
Execute_script method used for web scraping by Selenium
Get the anime name for this term by scraping