In the previous article, as a first step in Scrapy, I created a simple Spider and tried the URL extraction process. Scrapy can not only get web pages, but also get Web API results, download images, and more. This time, let's create a Spider that calls the Web API and saves the result.
Create a Spider to get station information based on the list of zip codes. To get the zip code, use the API (http://geoapi.heartrails.com/api.html) provided by HeartRails. The Spider looks like this:
get_station_spider.py
# -*- encoding:utf-8 -*-
import json
from scrapy import Spider
from scrapy.http import Request
class GetStationSpider(Spider):
name = "get_station_spider"
allowed_domains = ["express.heartrails.com"]
end_point = "http://geoapi.heartrails.com/api/json?method=getStations&postal=%s"
custom_settings = {
"DOWNLOAD_DELAY": 1.5,
}
#List of zip codes (originally obtained from DB etc.)
postal_list = [
1080072,
1050013,
1350063,
1020072,
9012206,
]
#This method is called when Spider is started. Make a request to call the API.
def start_requests(self):
for postal in self.postal_list:
url = self.end_point % postal
yield Request(url, self.parse)
#A method called after the download is complete. Extract information from the response and return it in dictionary format
def parse(self, response):
response = json.loads(response.body)
result = response['response']['station'][0]
yield {
'postal': result["postal"],
'name': result["name"],
'line': result["line"],
'latitude': result["y"],
'longitude': result["x"],
'prev': result['prev'],
'next': result['next'],
}
Just like last time, use the commands that come with Scrapy to crawl. Use the -o option to print the crawl results to stations.json.
scrapy runspider get_station_spider.py -o stations.json
[
{
"prev": "\u767d\u91d1\u53f0",
"name": "\u767d\u91d1\u9ad8\u8f2a",
"longitude": 139.734286,
"next": "\u4e09\u7530",
"latitude": 35.643147,
"line": "\u90fd\u55b6\u4e09\u7530\u7dda",
"postal": "1080072"
},
{
"prev": null,
"name": "\u30e2\u30ce\u30ec\u30fc\u30eb\u6d5c\u677e\u753a",
"longitude": 139.75667,
"next": "\u5929\u738b\u6d32\u30a2\u30a4\u30eb",
"latitude": 35.655746,
"line": "\u6771\u4eac\u30e2\u30ce\u30ec\u30fc\u30eb\u7fbd\u7530\u7dda",
"postal": "1050013"
},
{
"prev": "\u9752\u6d77",
"name": "\u56fd\u969b\u5c55\u793a\u5834\u6b63\u9580",
"longitude": 139.7913,
"next": "\u6709\u660e",
"latitude": 35.630212,
"line": "\u65b0\u4ea4\u901a\u3086\u308a\u304b\u3082\u3081",
"postal": "1350063"
},
{
"prev": "\u795e\u697d\u5742",
"name": "\u98ef\u7530\u6a4b",
"longitude": 139.746657,
"next": "\u4e5d\u6bb5\u4e0b",
"latitude": 35.701332,
"line": "\u6771\u4eac\u30e1\u30c8\u30ed\u6771\u897f\u7dda",
"postal": "1020072"
},
{
"prev": "\u5e02\u7acb\u75c5\u9662\u524d",
"name": "\u5100\u4fdd",
"longitude": 127.719295,
"next": "\u9996\u91cc",
"latitude": 26.224491,
"line": "\u6c96\u7e04\u3086\u3044\u30ec\u30fc\u30eb",
"postal": "9030821"
}
]
With Scrapy, it's very easy to write everything from calling a Web API to saving the execution results. Developers only need to create classes and functions that are called from the framework side, so it is possible to concentrate on the more essential parts. Next time, I will cover the image file download process. looking forward to!
Recommended Posts