I check Qiita's trend articles every day. And I try to read any titles that interest me. But when I think about it, I feel like I don't have to go to the site.
All you need is the article title and the URL of the article. So, I tried to get these two with one command.
qiita.rb
require 'open-uri'
require 'nokogiri'
url = 'https://qiita.com/'
html = open(url) { |f| f.read }
doc = Nokogiri::HTML.parse(html, nil, 'utf-8')
articles = doc.xpath("//a[@class='css-qrra2n']")
articles.each do |article|
print "\n" #Insert a blank line for each article to make it easier to read
puts article.text #Article title
puts article.attribute('href').value #Article URL
end
$ ruby qiita.rb
Article title 1
https://qiita.com/user_name1/items/aaaaaaaa
Article title 2
https://qiita.com/user_name2/items/bbbbbbbb
Article title 3
https://qiita.com/user_name3/items/cccccccc
(Omitted below)
For the time being, I made something to output on the terminal. I tried web scraping for the first time, but I was surprised that it was surprisingly easy to do. There are other sites that I check every day, so I would like to try them as well.
-Getting article information in ruby scraping -Create a scraping source in 20 minutes using Ruby -Required for crawler creation! XPATH notation summary
Recommended Posts