This time, I will make something that graphs the number of photo downloads.
** Details ** There is a copyright-free image posting site called Photo AC, and the number of downloads the day before is displayed there, but the next day I can not see the number of downloads two days ago, so once a day I would like to get it, put it in the DB, and finally make a graph. https://www.photo-ac.com/ (If your account is closed by this, please do it at your own risk lol)
Scraping
For now, I would like to implement the part that gets information from the site once.
Unlike the last time, this time I will use Mechanize because it involves login.
source 'https://rubygems.org/'
gem 'nokogiri'
gem 'mechanize'
In this state
bundle install --path .bundle
The reason why you specify the path with --path is that if you do not specify it, it will be reflected in all the local environment and it will be troublesome later.
Click here for the code of the part that is actually acquired
crawler.rb
require 'nokogiri'
require 'mechanize'
agent = Mechanize.new
agent.get("https://www.photo-ac.com")
agent.post("https://www.photo-ac.com/auth/login",{
acc_type: 'cr',
email: 'mail address',
password: 'password',
remember_me: '1'
})
page = agent.get("https://www.photo-ac.com/creator/list/?pl_q=&pl_order=-releasedate&pl_pp=200&pl_disp=all&pl_ntagsec=&pl_tags50over=&pl_chkpsd=")
doc = Nokogiri::HTML.parse(page.body, nil, 'utf-8')
doc.css(".photo-list").each{|div|
p div.css(".sectiondata li")[0].text
p div.css(".sectionimg .preview")[0].text
}
When you run this
"ID:2875969"
"0"
"ID:2875964"
"0"
"ID:2875028"
"0"
"ID:2875022"
"0"
"ID:2874964"
"0"
"ID:2871884"
"0"
"ID:2871883"
"0"
"ID:2871879"
"0"
"ID:2871873"
"0"
"ID:2871870"
"0"
"ID:2837286"
"0"
"ID:2837285"
"0"
"ID:2837282"
"0"
"ID:2837281"
"0"
"ID:2837280"
"0"
"ID:2837277"
"0"
"ID:2837276"
"0"
"ID:2836745"
"0"
"ID:2836741"
"3"
"ID:2836737"
"1"
"ID:2836735"
"2"
"ID:2836730"
"1"
"ID:2836723"
"0"
"ID:2836718"
"1"
"ID:2746521"
"6"
"ID:2746517"
"11"
"ID:2746513"
"1"
"ID:2746505"
"1"
"ID:2746086"
"1"
"ID:2746084"
"4"
"ID:2746070"
"15"
"ID:2746066"
"16"
"ID:2742664"
"10"
"ID:2742530"
"17"
"ID:2742522"
"6"
"ID:2742517"
"3"
"ID:2741719"
"4"
"ID:2741715"
"16"
"ID:2741708"
"2"
"ID:2741705"
"0"
"ID:2741700"
"0"
"ID:2741699"
"0"
"ID:2741675"
"21"
"ID:2741674"
"2"
"ID:2741653"
"0"
"ID:2741629"
"1"
"ID:2741567"
"0"
"ID:2741381"
"22"
"ID:2741336"
"7"
"ID:2733068"
"14"
"ID:2733060"
"0"
"ID:2733050"
"1"
"ID:2690326"
"2"
"ID:2690291"
"7"
"ID:2690259"
"1"
As mentioned above, you can see the ID of the image and the current number of downloads. If you put this in the DB once a day and store the data, it seems easy to graph it.
Recommended Posts