Use e-Stat API from Python

The Ministry of Internal Affairs and Communications eStat publishes data on government statistics. How to get government statistics data

  1. Access e-Stat HP (or Web scraping)
  2. Use e-Stat API

There is. This article mainly describes 2 (using e-Stat API). Use request and ʻurllib` in the same way that you would normally use an API in Python. For detailed specifications of eStat API (version 3.0), see API Specifications (ver 3.0). See 0).

Method 1: Access e-Stat HP

On the e-Stat homepage, data is provided from the DB with a structured URL, so web scraping is possible. As an example, the URL of e-Stat for "Household Survey / Household Income and Expenditure Detailed Results Table for Households with Two or More People"

https://www.e-stat.go.jp/stat-search/files?page=1&layout=datalist&toukei=00200561&tstat=000000330001&cycle=1&year=20200&month=12040606&tclass1=000000330001&tclass2=000000330004&tclass3=000000330005&stat_infid=000031969452&result_back=1

Has the following correspondence between each classification item and HTTP GET parameter specification (" & <param> = <value> ").

item value URL params
Government statistics name Household survey &toukei=00200561
Provided statistic name Household survey &tstat=000000330001
Offer classification 1 Household income and expenditure tclass1=000000330001
Offer classification 2 Households with two or more people (results excluding agriculture, forestry and fishermen's households) tclass2=000000330002
Offer classification 3 Detailed result table tclass3=000000330003
Offer cycle Monthly &cycle=1
Survey date January 2000 &year=20000 &month=11010301

** Government Statistics Code **

Data published on eStat are identified by government statistical codes and statistical table IDs. For example, the census government statistics code is 00200521, and the household survey government statistics code is 00200561.

When acquiring data from e-Stat, first specify the "government statistics code" and perform a conditional search for the statistical table included in the specified government statistics. There are other codes set by the Ministry of Internal Affairs and Communications, such as prefecture codes and city codes. The code to be referred to when performing a conditional search is described here.

-Government Statistics Code List (Example: "Census 2015" = "00200521") -Statistical table creation organization code (Example: "Ministry of Internal Affairs and Communications" = "00200") -Prefecture code, city code (Example: "Hokkaido" = "01" "Sapporo City" = "1002") -National Local Public Organization Code (Example: "Sapporo, Hokkaido" = "011002")

Method 2: Use e-Stat API

** step 1: Get the application ID of the eStat API. ** **

Access the top page of eStats API and perform the procedure from "User registration / login". After registering your name, email address and password, the application ID will be sent by email. API calls cannot be made without the application ID.

** step 2: Call HTTP GET and receive response (data) from API. ** **

For details on how to use the API, refer to API Specification (ver 3.0). Basically, if you can make the following two calls, there will be almost no problem.

  1. Get statistics (getStatsList) --Specify the URL and parameters to get information (ID, name, etc.) of all statistical tables that meet the conditions. --URL: ʻEstatRestAPI_URLParser () Use the getStatsListURL () method of the class`. See Specifications 2.1 --Parameter: See Specification 3.2
  2. Get statistical data (getStatsData) --Specify the URL and parameters to get the raw data of the statistical table that meets the conditions. --URL: ʻEstatRestAPI_URLParser () Use the getStatsDataURL () method of the class`. See Specification 2.3 --Parameter: See Specification 3.4.

Python code

Specify by referring to specification of e-Stat API (version 3.0.0) I made a module to generate an appropriate URL for the parameters.

import urllib
import requests


class EstatRestAPI_URLParser:
    """
    This is a simple python module class for e-Stat API (ver.3.0).
    See more details at https://www.e-stat.go.jp/api/api-info/e-stat-manual3-0
    """

    def __init__(self, api_version=None, app_id=None):
        # base url
        self.base_url = "https://api.e-stat.go.jp/rest"

        # e-Stat REST API Version
        if api_version is None:
            self.api_version = "3.0"
        else:
            self.api_version = api_version

        # Application ID
        if app_id is None:
            self.app_id = "****************" #Enter the application ID here
        else:
            self.app_id = app_id

    def getStatsListURL(self, params_dict, format="csv"):
        """
        2.1 Get statistical table information(HTTP GET)
        """
        params_str = urllib.parse.urlencode(params_dict)
        if format == "xml":
            url = (
                f"{self.base_url}/{self.api_version}"
                f"/app/getStatsList?{params_str}"
            )
        elif format == "json":
            url = (
                f"{self.base_url}/{self.api_version}"
                f"/app/json/getStatsList?{params_str}"
            )
        elif format == "jsonp":
            url = (
                f"{self.base_url}/{self.api_version}"
                f"/app/jsonp/getStatsList?{params_str}"
            )
        elif format == "csv":
            url = (
                f"{self.base_url}/{self.api_version}"
                f"/app/getSimpleStatsList?{params_str}"
            )
        return url

    def getMetaInfoURL(self, params_dict, format="csv"):
        """
        2.2 Meta information acquisition(HTTP GET)
        """
        params_str = urllib.parse.urlencode(params_dict)
        if format == "xml":
            url = (
                f"{self.base_url}/{self.api_version}"
                f"/app/getMetaInfo?{params_str}"
            )
        elif format == "json":
            url = (
                f"{self.base_url}/{self.api_version}"
                f"/app/json/getMetaInfo?{params_str}"
            )
        elif format == "jsonp":
            url = (
                f"{self.base_url}/{self.api_version}"
                f"/app/jsonp/getMetaInfo?{params_str}"
            )
        elif format == "csv":
            url = (
                f"{self.base_url}/{self.api_version}"
                f"/app/getSimpleMetaInfo?{params_str}"
            )
        return url

    def getStatsDataURL(self, params_dict, format="csv"):
        """
        2.3 Statistical data acquisition(HTTP GET)
        """
        params_str = urllib.parse.urlencode(params_dict)
        if format == "xml":
            url = (
                f"{self.base_url}/{self.api_version}"
                f"/app/getStatsData?{params_str}"
            )
        elif format == "json":
            url = (
                f"{self.base_url}/{self.api_version}"
                f"/app/json/getStatsData?{params_str}"
            )
        elif format == "jsonp":
            url = (
                f"{self.base_url}/{self.api_version}"
                f"/app/jsonp/getStatsData?{params_str}"
            )
        elif format == "csv":
            url = (
                f"{self.base_url}/{self.api_version}"
                f"/app/getSimpleStatsData?{params_str}"
            )
        return url

    def postDatasetURL(self):
        """
        2.4 Data set registration(HTTP POST)
        """
        url = (
            f"{self.base_url}/{self.api_version}"
            "/app/postDataset"
        )
        return url

    def refDataset(self, params_dict, format="xml"):
        """
        2.5 Dataset reference(HTTP GET)
        """
        params_str = urllib.parse.urlencode(params_dict)
        if format == "xml":
            url = (
                f"{self.base_url}/{self.api_version}"
                + f"/app/refDataset?{params_str}"
            )
        elif format == "json":
            url = (
                f"{self.base_url}/{self.api_version}"
                f"/app/json/refDataset?{params_str}"
            )
        elif format == "jsonp":
            url = (
                f"{self.base_url}/{self.api_version}"
                f"/app/jsonp/refDataset?{params_str}"
            )
        return url

    def getDataCatalogURL(self, params_dict, format="xml"):
        """
        2.6 Data catalog information acquisition(HTTP GET)
        """
        params_str = urllib.parse.urlencode(params_dict)
        if format == "xml":
            url = (
                f"{self.base_url}/{self.api_version}"
                f"/app/getDataCatalog?{params_str}"
            )
        elif format == "json":
            url = (
                f"{self.base_url}/{self.api_version}"
                f"/app/json/getDataCatalog?{params_str}"
            )
        elif format == "jsonp":
            url = (
                f"{self.base_url}/{self.api_version}"
                f"/app/jsonp/getDataCatalog?{params_str}"
            )
        return url

    def getStatsDatasURL(self, params_dict, format="xml"):
        """
        2.7 Collective statistical data(HTTP GET)
        """
        params_str = urllib.parse.urlencode(params_dict)
        if format == "xml":
            url = (
                f"{self.base_url}/{self.api_version}"
                f"/app/getStatsDatas?{params_str}"
            )
        elif format == "json":
            url = (
                f"{self.base_url}/{self.api_version}"
                f"/app/json/getStatsDatas?{params_str}"
            )
        elif format == "csv":
            url = (
                f"{self.base_url}/{self.api_version}"
                f"/app/getSimpleStatsDatas?{params_str}"
            )
        return url
import csv
import json
import xlrd
import zipfile
import requests
import functools
import pandas as pd
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm


def get_json(url):
    """
    Request a HTTP GET method to the given url (for REST API)
    and return its response as the dict object.

    Args:
    ====
    url: string
        valid url for REST API
    """
    try:
        print("HTTP GET", url)
        r = requests.get(url)
        json_dict = r.json()
        return json_dict
    except requests.exceptions.RequestException as error:    
        print(error)


def download_json(url, filepath):
    """
    Request a HTTP GET method to the given url (for REST API)
    and save its response as the json file.

    Args:
    url: string
        valid url for REST API
    filepath: string
        valid path to the destination file
    """
    try:
        print("HTTP GET", url)
        r = requests.get(url)
        json_dict = r.json()
        json_str = json.dumps(json_dict, indent=2, ensure_ascii=False)
        with open(filepath, "w") as f:
            f.write(json_str)
    except requests.exceptions.RequestException as error:
        print(error)



def download_csv(url, filepath, enc="utf-8", dec="utf-8", logging=False):
    """
    Request a HTTP GET method to the given url (for REST API)
    and save its response as the csv file.

    Args:
    =====
    url: string
        valid url for REST API
    filepathe: string
        valid path to the destination file
    enc: string
        encoding type for a content in a given url
    dec: string
        decoding type for a content in a downloaded file
            dec = 'utf-8' for general env
            dec = 'sjis'  for Excel on Win
            dec = 'cp932' for Excel with extended JP str on Win
    logging: True/False
        flag whether putting process log
    """
    try:
        if logging:
            print("HTTP GET", url)
        r = requests.get(url, stream=True)
        with open(filepath, 'w', encoding=enc) as f:
            f.write(r.content.decode(dec))
    except requests.exceptions.RequestException as error:
        print(error)


def download_all_csv(
        urls,
        filepathes,
        max_workers=10,
        enc="utf-8",
        dec="utf-8"):
    """
    Request some HTTP GET methods to the given urls (for REST API)
    and save each response as the csv file.
    (!! This method uses multi threading when calling HTTP GET requests
    and downloading files in order to improve the processing speed.)

    Args:
    =====
    urls: list of strings
        valid urls for REST API
    filepathes: list of strings
        valid pathes to the destination file
    max_workers: int
        max number of working threads of CPUs within executing this method.
    enc: string
        encoding type for a content in a given url
    dec: string
        decoding type for a content in a downloaded file
            dec = 'utf-8' for general env
            dec = 'sjis'  for Excel on Win
            dec = 'cp932' for Excel with extended JP str on Win
    logging: True/False
    """
    func = functools.partial(download_csv, enc=enc, dec=dec)
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        results = list(
            tqdm(executor.map(func, urls, filepathes), total=len(urls))
        )
        del results

sample

Social and Demographic System (Municipalities in Statistics) provided by the Ministry of Internal Affairs and Communications every year. From, each item is aggregated via the eStat API with the city, ward, town, and village as the area unit, and saved as a local file.

import os
from pprint import pprint
from estat_api import EstatRestAPI_URLParser

appId = "****************" #Enter the application ID here
estatapi_url_parser = EstatRestAPI_URLParser()  # URL Parser


def search_tables():
    """
    Prams (dictionary) to search eStat tables.
    For more details, see also
    https://www.e-stat.go.jp/api/api-info/e-stat-manual3-0#api_3_2

        - appId: Application ID (*required)
        - lang:language(J:Japanese, E:English)
        - surveyYears:Survey date(YYYYY or YYYYMM or YYYYMM-YYYYMM)
        - openYears:Same as the survey date
        - statsField:Statistics field(2 digits:Statistical classification,4 digits:Statistical subclass)
        - statsCode:Government statistics code(8-digit)
        - searchWord:Search keyword
        - searchKind:Data type(1:Statistics, 2:Subregion / regional mesh)     
        - collectArea:Aggregate area classification(1:Nationwide, 2:Prefectures, 3:Municipality)        
        - explanationGetFlg:Existence of commentary information(Y or N)
        - ...
    """
    appId = "65a9e884e72959615c2c7c293ebfaeaebffb6030"  # Application ID    
    params_dict = {
        "appId": appId,
        "lang": "J",
        "statsCode": "00200502",
        "searchWord": "Social / demographic system",  # "Municipalities as seen in statistics",
        "searchKind": 1,
        "collectArea": 3,
        "explanationGetFlg": "N"
    }

    url = estatapi_url_parser.getStatsListURL(params_dict, format="json")   
    json_dict = get_json(url)
    # pprint(json_dict)

    if json_dict['GET_STATS_LIST']['DATALIST_INF']['NUMBER'] != 0:
        tables = json_dict["GET_STATS_LIST"]["DATALIST_INF"]["TABLE_INF"]
    else:
        tables = []
    return tables


def parse_table_id(table):
    return table["@id"]


def parse_table_raw_size(table):
    return table["OVERALL_TOTAL_NUMBER"]


def parse_table_urls(table_id, table_raw_size, csv_raw_size=100000):
    urls = []
    for j in range(0, int(table_raw_size / csv_raw_size) + 1):
        start_pos = j * csv_raw_size + 1
        params_dict = {
            "appId": appId,  # Application ID
            "lang": "J",  #language(J:Japanese, E:English)
            "statsDataId": str(table_id),  #Statistical table ID
            "startPosition": start_pos,  #Start line
            "limit": csv_raw_size,  #Number of data acquisitions
            "explanationGetFlg": "N",  #Existence of commentary information(Y or N)
            "annotationGetFlg": "N",  #Presence or absence of annotation information(Y or N)
            "metaGetFlg": "N",  #Presence or absence of meta information(Y or N)
            "sectionHeaderFlg": "2",  #CSV header flag(1:Get, 2:Get無)
        }
        url = estatapi_url_parser.getStatsDataURL(params_dict, format="csv")
        urls.append(url)
    return urls


if __name__ == '__main__':
    CSV_RAW_SIZE = 100000

    # list of tables
    tables = search_tables()

    # extract all table ids
    if len(tables) == 0:
        print("No tables were found.")
    elif len(tables) == 1:
        table_ids = [parse_table_id(tables[0])]
    else:
        table_ids = list(map(parse_table_id, tables))

    # list of urls
    table_urls = []
    table_raw_size = list(map(parse_table_raw_size, tables))
    for i, table_id in enumerate(table_ids):
        table_urls = table_urls + parse_table_urls(table_id, table_raw_size[i])

    # list of filepathes
    filepathes = []
    for i, table_id in enumerate(table_ids):
        table_name = tables[i]["TITLE_SPEC"]["TABLE_NAME"]
        table_dir = f"./downloads/tmp/{table_name}_{table_id}"
        os.makedirs(table_dir, exist_ok=True)
        for j in range(0, int(table_raw_size[i] / CSV_RAW_SIZE) + 1):
            filepath = f"{table_dir}/{table_name}_{table_id}_{j}.csv"
            filepathes.append(filepath)

    download_all_csv(table_urls, filepathes, max_workers=30)

Recommended Posts

Use e-Stat API from Python
Use kabu Station® API from Python
Use Google Analytics API from Python
Use thingsspeak from python
Use fluentd from python
Use MySQL from Python
Use Google Cloud Vision API from Python
Use MySQL from Python
Use BigQuery from python.
Use mecab-ipadic-neologd from python
Use Trello API with python
Use Twitter API with Python
Use MySQL from Anaconda (python)
Use Stanford Core NLP from Python
Read and use Python files from Python
Forcibly use Google Translate from python
Use Azure Blob Storage from Python
Get upcoming weather from python weather api
Run Ansible from Python using API
Use fastText trained model from Python
[Python] How to use Typetalk API
Handle SOAP API from Python (Zeep)
Collecting information from Twitter with Python (Twitter API)
[Python] Web application from 0! Hands-on (3) -API implementation-
Use PostgreSQL data type (jsonb) from Python
Python: Reading JSON data from web API
Use machine learning APIs A3RT from Python
I want to use jar from python
sql from python
I tried using UnityCloudBuild API from Python
Use JIRA API
MeCab from Python
Use Django from a local Python script
Use C ++ functions from python with pybind11
API explanation to touch mastodon from python
Connect to coincheck's Websocket API from Python
Firebase: Use Cloud Firestore and Cloud Storage from Python
Procedure to use TeamGant's WEB API (using python)
Study from Python Hour7: How to use classes
[Bash] Use here-documents to get python power from bash
Wrap C with Cython for use from Python
Use Python in your environment from Win Automation
I want to use ceres solver from python
Let's use different versions of SQLite3 from Python3!
Wrap C ++ with Cython for use from Python
Try accessing the YQL API directly from Python 3
Use Tor to connect from urllib2 [Python] [Mac]
Python: Use zipfile to unzip from standard input
Touch MySQL from Python 3
Use config.ini in Python
Operate Filemaker from Python
[Python] Use JSON with Python
Use dates in Python
Changes from Python 2 to Python 3.0
Python from or import
Use mecab with Python3
Use LiquidTap Python Client ③
Run python from excel
Install python from source
Use DynamoDB with Python
Execute command from Python