If you use the "ScraperWiki" service, you can use Web scraping without having to rent a server yourself. 82% A6% E3% 82% A7% E3% 83% 96% E3% 82% B9% E3% 82% AF% E3% 83% AC% E3% 82% A4% E3% 83% 94% E3% 83% B3% E3% 82% B0) can be done on a regular basis.
Script edit screen
DB
#!/usr/bin/env python
import scraperwiki
import lxml.html
import json
url = "http://target.website.hoge/index.html" #Target site to scrape
html = scraperwiki.scrape(url) #html document acquisition
root = lxml.html.fromstring(html) #Get root element object
data = []
id = 0
for el in root.cssselect("#hoge_contents > li > span"): #Extract elements with css selector
data.append({'id':id, 'text':el.text }) #Save the text of the extracted element
id = id + 1
print repr(data) #Output the saved data to the console
# Saving data:
unique_keys = [ 'id' ] #Specify a unique key
scraperwiki.sql.save(unique_keys, data) #Save to DB
Example actually used http://shimz.me/blog/d3-js/3353
Recommended Posts