Recently, an African open data portal site called openAFRICA operated by an organization called Code for Africa and its own water supply in Rwanda. The automatic linkage function of water supply vector tile data, which is maintained jointly with the public corporation WASAC, was implemented in Python.
Since it uses an API called CKAN, which seems to be widely used in open data sites of Japanese local governments, I think it can be used when you want to automatically link open data such as files owned by your organization via the API. So I want to share it.
-You have your own account on an open data platform using CKAN API --Managing open data on Github
Throughout this article, when the open data on Github is updated, Github Action will be used to automatically link the data on the platform via CKAN.
By the way, the openAFRICA page of the open data of the water supply vector tile of Rwanda Waterworks Corporation can be found at the following link. https://open.africa/dataset/rw-water-vectortiles
In addition, the Github repository of water vector tiles can be found at the link below, and it is automatically updated to Github from the server of the waterworks company every week. https://github.com/WASAC/vt
If pipenv is not installed, please set it first.
git clone https://github.com/watergis/open-africa-uploader
cd open-africa-uploader
pipenv install
pipenv shell
First, I will post the full source code of OpenAfricaUploader.py
in the repository.
import os
import ckanapi
import requests
class OpanAfricaUploader(object):
def __init__(self, api_key):
"""Constructor
Args:
api_key (string): CKAN api key
"""
self.data_portal = 'https://africaopendata.org'
self.APIKEY = api_key
self.ckan = ckanapi.RemoteCKAN(self.data_portal, apikey=self.APIKEY)
def create_package(self, url, title):
"""create new package if it does not exist yet.
Args:
url (str): the url of package eg. https://open.africa/dataset/{package url}
title (str): the title of package
"""
package_name = url
package_title = title
try:
print ('Creating "{package_title}" package'.format(**locals()))
self.package = self.ckan.action.package_create(name=package_name,
title=package_title,
owner_org = 'water-and-sanitation-corporation-ltd-wasac')
except (ckanapi.ValidationError) as e:
if (e.error_dict['__type'] == 'Validation Error' and
e.error_dict['name'] == ['That URL is already in use.']):
print ('"{package_title}" package already exists'.format(**locals()))
self.package = self.ckan.action.package_show(id=package_name)
else:
raise
def resource_create(self, data, path, api="/api/action/resource_create"):
"""create new resource, or update existing resource
Args:
data (object): data for creating resource. data must contain package_id, name, format, description. If you overwrite existing resource, id also must be included.
path (str): file path for uploading
api (str, optional): API url for creating or updating. Defaults to "/api/action/resource_create". If you want to update, please specify url for "/api/action/resource_update"
"""
self.api_url = self.data_portal + api
print ('Creating "{}"'.format(data['name']))
r = requests.post(self.api_url,
data=data,
headers={'Authorization': self.APIKEY},
files=[('upload', open(path, 'rb'))])
if r.status_code != 200:
print ('Error while creating resource: {0}'.format(r.content))
else:
print ('Uploaded "{}" successfully'.format(data['name']))
def resource_update(self, data, path):
"""update existing resource
Args:
data (object): data for creating resource. data must contain id, package_id, name, format, description.
path (str): file path for uploading
"""
self.resource_create(data, path, "/api/action/resource_update")
def upload_datasets(self, path, description):
"""upload datasets under the package
Args:
path (str): file path for uploading
description (str): description for the dataset
"""
filename = os.path.basename(path)
extension = os.path.splitext(filename)[1][1:].lower()
data = {
'package_id': self.package['id'],
'name': filename,
'format': extension,
'description': description
}
resources = self.package['resources']
if len(resources) > 0:
target_resource = None
for resource in reversed(resources):
if filename == resource['name']:
target_resource = resource
break
if target_resource == None:
self.resource_create(data, path)
else:
print ('Resource "{}" already exists, it will be overwritten'.format(target_resource['name']))
data['id'] = target_resource['id']
self.resource_update(data, path)
else:
self.resource_create(data, path)
The source code to call OpenAfricaUploader.py
and upload the file looks like the following.
import os
from OpenAfricaUploader import OpanAfricaUploader
uploader = OpanAfricaUploader(args.key)
uploader.create_package('rw-water-vectortiles','Vector Tiles for rural water supply systems in Rwanda')
uploader.upload_datasets(os.path.abspath('../data/rwss.mbtiles'), 'mbtiles format of Mapbox Vector Tiles which was created by tippecanoe.')
I will explain one by one.
This module has the URL of the base portal site set in the constructor in advance for uploading to openAFRICA.
Replace the URL of self.data_portal ='https://africaopendata.org'
with the URL of the CKAN API used by your organization.
def __init__(self, api_key):
"""Constructor
Args:
api_key (string): CKAN api key
"""
self.data_portal = 'https://africaopendata.org'
self.APIKEY = api_key
self.ckan = ckanapi.RemoteCKAN(self.data_portal, apikey=self.APIKEY)
The call to the constructor looks like this: Specify the CKAN API key for your account in args.key
.
uploader = OpanAfricaUploader(args.key)
Create a package using the API package_create. At that time, specify the following as an argument.
--name = The string specified here will be the URL of the package --title = Package title --owner_org = ID of the target organization on the CKAN portal
If the creation is successful, the package information will be returned as a return value. If it already exists, an error will occur, so I am writing a process to get the existing package information in the exception handling.
def create_package(self, url, title):
"""create new package if it does not exist yet.
Args:
url (str): the url of package eg. https://open.africa/dataset/{package url}
title (str): the title of package
"""
package_name = url
package_title = title
try:
print ('Creating "{package_title}" package'.format(**locals()))
self.package = self.ckan.action.package_create(name=package_name,
title=package_title,
owner_org = 'water-and-sanitation-corporation-ltd-wasac')
except (ckanapi.ValidationError) as e:
if (e.error_dict['__type'] == 'Validation Error' and
e.error_dict['name'] == ['That URL is already in use.']):
print ('"{package_title}" package already exists'.format(**locals()))
self.package = self.ckan.action.package_show(id=package_name)
else:
raise
The way to call this function is as follows
uploader.create_package('rw-water-vectortiles','Vector Tiles for rural water supply systems in Rwanda')
Resources are created with a function called resource_create
. You can use the REST API / api / action / resource_create
to pass the binary data and file information to be uploaded.
def resource_create(self, data, path, api="/api/action/resource_create"):
self.api_url = self.data_portal + api
print ('Creating "{}"'.format(data['name']))
r = requests.post(self.api_url,
data=data,
headers={'Authorization': self.APIKEY},
files=[('upload', open(path, 'rb'))])
if r.status_code != 200:
print ('Error while creating resource: {0}'.format(r.content))
else:
print ('Uploaded "{}" successfully'.format(data['name']))
However, if you only use resource_create
, you can only add resources, and the number will increase steadily each time you update, so use the API / api / action / resource_update
to update any existing resources. I will do it.
The usage of resource_update
is basically the same as resource_create
, the only difference is whether or not there is resource_id
in data
.
def resource_update(self, data, path):
self.resource_create(data, path, "/api/action/resource_update")
A function called upload_datasets
is a nice combination of resource_create
and resource_update
, updating existing resources if they exist, and creating new ones if they don't.
def upload_datasets(self, path, description):
#Separate the file name from the extension
filename = os.path.basename(path)
extension = os.path.splitext(filename)[1][1:].lower()
#Create data for resource creation
data = {
'package_id': self.package['id'], #Package ID
'name': filename, #File name to be updated
'format': extension, #Format (here, extension)
'description': description #File description
}
#If there is already a resource in the package, check if there is a resource with the same name as the file name to be uploaded.
resources = self.package['resources']
if len(resources) > 0:
target_resource = None
for resource in reversed(resources):
if filename == resource['name']:
target_resource = resource
break
if target_resource == None:
#Resource if no resource with the same name exists_Call create
self.resource_create(data, path)
else:
#If there is a resource, set the ID in data and resource_Call update
print ('Resource "{}" already exists, it will be overwritten'.format(target_resource['name']))
data['id'] = target_resource['id']
self.resource_update(data, path)
else:
#Resource if no resource_Call create
self.resource_create(data, path)
The way to call the upload_datasets
function is as follows.
uploader.upload_datasets(os.path.abspath('../data/rwss.mbtiles'), 'mbtiles format of Mapbox Vector Tiles which was created by tippecanoe.')
You can call it from the command line with upload2openafrica.py
.
import os
import argparse
from OpenAfricaUploader import OpanAfricaUploader
def get_args():
prog = "upload2openafrica.py"
usage = "%(prog)s [options]"
parser = argparse.ArgumentParser(prog=prog, usage=usage)
parser.add_argument("--key", dest="key", help="Your CKAN api key", required=True)
parser.add_argument("--pkg", dest="package", help="Target url of your package", required=True)
parser.add_argument("--title", dest="title", help="Title of your package", required=True)
parser.add_argument("--file", dest="file", help="Relative path of file which you would like to upload", required=True)
parser.add_argument("--desc", dest="description", help="any description for your file", required=True)
args = parser.parse_args()
return args
if __name__ == "__main__":
args = get_args()
uploader = OpanAfricaUploader(args.key)
uploader.create_package(args.package,args.title)
uploader.upload_datasets(os.path.abspath(args.file), args.description)
When actually using it, it looks like the following. I am making a shell script called upload_mbtiles.sh
. Be sure to set the environment variable to CKAN_API_KEY
.
#!/bin/bash
pipenv run python upload2openafrica.py \
--key ${CKAN_API_KEY} \
--pkg rw-water-vectortiles \
--title "Vector Tiles for rural water supply systems in Rwanda" \
--file ../data/rwss.mbtiles \
--desc "mbtiles format of Mapbox Vector Tiles which was created by tippecanoe."
You can now upload open data using the CKAN API.
However, it is troublesome to manually link with CKAN every time, so I will automate it with Github Action. The workflow file looks like this:
name: openAFRICA upload
on:
push:
branches: [ master ]
#Here, the workflow is run when the data folder and below are updated.
paths:
- "data/**"
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.8
uses: actions/setup-python@v2
with:
python-version: 3.8
- name: Install dependencies
#First, make the initial settings for Pipenv.
run: |
cd scripts
pip install pipenv
pipenv install
- name: upload to openAFRICA
#CKAN in Secrets on the Settings page of the Github repository_API_If you register with the name KEY, you can use environment variables as follows
env:
CKAN_API_KEY: ${{secrets.CKAN_API_KEY}}
#After that, I will call the shell script
run: |
cd scripts
./upload_mbtiles.sh
With this alone, once the file is uploaded to Github, it can be automatically linked to the open data platform. The following image is the screen when Github Aciton of Rwanda's Water Authority is executed.
The CKAN API is used on various open source platforms at home and abroad. The CKAN API can implement data linkage relatively easily by using Python. Also, if open data is managed on Github, Github Action can be used to make automatic linkage even easier.
We hope that the module created for openAFRICA will be useful for utilizing open data using other CKAN in Japan and overseas.
Recommended Posts