1_ Get the operation status of JR West 2_ Get a route with a delay 3_ output To do.
[Operating status quote source] http://trafficinfo.westjr.co.jp/list.html
Only the operation status is obtained from html using xpath. I used lxml in the process here. It's not particularly difficult, so is it for scraping practice? May be just right for you. It also looks like an iPhone when accessed by a user agent. Since it is received as an argument, it can be changed freely. Also, if there is no delay, the function will end the process as it is without returning (because it is if body :).
Output sample
[01:26 Current operating status]
Chugoku area:There is a delay etc.
code
GET_JR_INFO.py
#coding: utf-8
import urllib2, lxml.html
from datetime import datetime
def GET_info(user_agent):
headers = {'User-Agent': user_agent}
root = lxml.html.fromstring(urllib2.urlopen(urllib2.Request("http://trafficinfo.westjr.co.jp/list.html", None, headers)).read())
list_data = []
list_data.append("[{}]".format("".join([i.text for i in root.xpath('//*[@id="contents"]/div[1]/h1')]).encode("utf-8")))
for i in range(1, 7):
path_status = '//*[@id="contents"]/div[2]/ul/li[%d]/span[3]' % i
path_name = '//*[@id="contents"]/div[2]/ul/li[%d]/span[1]' % i
if not "There is no delay information." in "".join([i.text for i in root.xpath(path_status)]).encode("utf-8"):
body = "{}: {}".format(
"".join([i.text for i in root.xpath(path_name)]).encode("utf-8"),
"".join([i.text for i in root.xpath(path_status)]).encode("utf-8"))
list_data.append(body)
if body:
return list_data
if __name__ == "__main__":
print "\n".join(GET_info(
user_agent="Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A403 Safari/8536.25"
))
Recommended Posts