Everyone. Do you know the hottest web application in the world right now?
so.
e-Stat.
Government statistics are compiled by field and data creation organization, and data can be searched. Furthermore, it is a portal site that can be viewed in various formats such as graphs and tables.
The API is also prepared, and the site design is cute and modern.
In addition, it seems that there is also a function called statistical GIS that visualizes statistical data on a map.
This seems to be a function that allows you to select specific statistical data such as census, vital statistics, medical facility survey, etc. and visualize it on a map.
Wonderful!
Looking at such an application, it is natural for a GIS shop to be able to stand
"I also want to make this !!!!"
I think.
But this "e-Stat". There were many points where an amateur would easily fall into a deep swamp ...
A wide variety of data is registered in e-Stat.
First, there are 17 types of statistical fields such as "land / weather" and "population / household". Search by field
As an organization, there are about 14 government statistics under the jurisdiction of "Cabinet Secretariat" and "Ministry of Economy, Trade and Industry". Search by organization
You need to search for the data you need from a total of about 1.5 million (!) Data.
Furthermore, since the data format is different for each field and ministry, even if the desired statistical data can be obtained, it will be fun to see if it can be used effectively.
Since I specialize in creating web applications in the GIS field, I was very welcome to have an API that can be used on the web, but [API specification 3.0 of the official statistics counter (e-Stat) If you look at Version, it seems that the following 7 APIs are also prepared.
--Acquisition of statistical table information --Meta information acquisition --Statistical data acquisition --Dataset registration --Dataset reference --Data catalog information acquisition --Batch statistical data acquisition
It may be a story that you can just read the specifications well, but even if you try to get statistical data easily for the time being, it will be a bit painful to start from the stage of investigating which API you are aiming for. is. (There is too much data, so it can't be helped)
Various data are managed by some codes other than the "statistical field" introduced earlier.
I couldn't confirm the page that was compiled in particular, but when I looked at the API specifications, I found at least four XX codes and IDs, including those in the statistical field.
--Small classification code in the field of statistics --Government statistics code --Statistical table ID --Standard area code
In conclusion, you can get the desired statistical data by using the above code in combination with 3 types of API introduced earlier, but even if you understand the meaning of these phrases, maybe Doesn't it take hours to days? (I took)
For the time being, the Development Guide is prepared, and the flow for acquiring the target statistical data is carefully described, but I do not know what kind of statistical data can be obtained, and The attached image is too small to read the characters, and you end up searching various pages to collect information.
Spicy.
Users of e-Stat probably want to do various analyzes based on statistical data! I think that is the person.
Among them, I think that there are many people who want to perform analysis using GIS, but in reality, there was not so much data available in GIS.
If you look closely at the page for searching data, the number of data managed as a "database" is 263/911, while the number of other mysterious files managed is 648/911.
Moreover, there is a high probability that it will be a PDF.
It will not be data that can be easily used from your own application.
By the way, Q1: I want to know the data that can be used with the API function. If you look at the link of "Provided data", you will be taken to the following page, so it seems that about 180,000 out of 1.5 million can be used as API.
--JSON output: …/getStatsData? <Parameter group>
--CSV output: …/getSimpleStatsData? <Parameter group>
(why…)
"RESULT"
"STATUS","0"
"ERROR_MSG","It ended normally."
"DATE","2020-12-18T15:58:49.161+09:00"
"TABLE_INF","0000010101"
"STAT_NAME","00200502","Social / demographic system"
"GOV_ORG","00200","Ministry of Internal Affairs and Communications"
"STATISTICS_NAME","Prefectural data Basic data"
"TITLE","0000010101","A Population / household"
"CYCLE","Yearly"
"SURVEY_DATE","0"
"OPEN_DATE","2020-03-06"
"SMALL_AREA","0"
"COLLECT_AREA","Nationwide"
"MAIN_CATEGORY","99","Other"
"SUB_CATEGORY","99","Other"
"OVERALL_TOTAL_NUMBER","486096"
"UPDATED_DATE","2020-03-06"
"STATISTICS_NAME_SPEC","Prefecture data","basic data","","","",""
"TITLE_SPEC","","A Population / household","","",""
"CLASS_INF"
"CLASS_OBJ_ID","CLASS_OBJ_NAME","CLASS_CODE","CLASS_NAME","CLASS_LEVEL","CLASS_UNIT","CLASS_PARENT_CODE","CLASS_ADD_INF"
(When the desired csv comes out ...)
Also, there are many old version APIs left and it is difficult to follow the information, etc ...
e-stat-api-tools
With that said, I tried licking various data and APIs, but e-Stat is very convenient but has some addictive points.
I think there are many people who think, "The threshold is a little high ..." like me.
MIERUNE can be used for those who are lost. After preparing a list of available data and creating a wrapper for the main API, the city boundary polygon and the target statistical data are merged and output as GeoJSON. We have created a CLI tool that can be used!
In conclusion, to get the desired statistical data
--Enter statistical table ID
andmeta information (detailed item information)
in statistical data acquisition API
It is necessary, but in order to reach this point, statistical table ID
and meta information
must be obtained from different APIs.
For e-Stat API, as described in Development Guide
--Get the statistical table ID
of the target statistical table by entering the government statistical code
or survey date
in the statistical table information acquisition API
.
--Enter the acquired statistical table ID
in the metadata information acquisition API
to acquire the meta information (detailed item information)
related to the target statistical table.
--Enter the statistical table ID
andmeta information (detailed item information)
in the statistical data acquisition API
to acquire the desired statistical data.
Hit the three APIs in the flow of to get the desired statistical data.
(You can try various APIs with the official API Function Test Form Version 3.0, so if you look at what kind of input and what kind of response will be returned, you will understand better.)
However, the first step, Government Statistics Code
, can be referenced from Government Statistics Code List, but it is in PDF format.
So, first of all, I converted this data to tsv so that it can be machine-readable.
It is stored in the repository, so feel free to use it. government_statistics_codes.tsv
The above-mentioned official statistics code is used to obtain the statistical table ID, but if you look at the e-Stat data or ask about the trends in the industry, the main statistics that you will want to use in GIS. Isn't the data Social and Population Statistics? I thought (without permission).
Therefore, I have stored the statistical table ID list in the repository so that the social and demographic system can be used smoothly. default_stats_table_ids.csv
The standard area code is a code indicating "prefecture and municipal area". Standard area code used for statistics
It is a part related to boundary data acquisition, which will be described later, rather than statistical data acquisition, but this is also included in the repository.
The data is as of December 2020, so if you have any integration of cities, wards, towns and villages, please download and update from Find Municipality.
The following main APIs are wrapped in e-stat package, so they can be called from Python, but at this stage the maintenance is not perfect and it is not registered in PyPI, so I will explain it. Is omitted. (Any one will be supported)
--Statistical table information acquisition API --Metadata information acquisition API --Statistical data acquisition API
Using the above package, I created a tool that can operate the following 5 from the command line.
--ids: Get statistical table ID list --meta: Get statistical table metadata --stats: Get statistical data --Boundary: Get boundary data --merge-boundary: Get and merge statistical data and boundary data
Please refer to README.md for details on how to use various commands.
So let's use this tool to merge boundary data and stats!
First, clone e_stat_api_tools and move to the e_stat_api_sample
directory.
% cd .../e_stat_api_sample
% pwd
.../e_stat_api_sample
Then use the pipenv install
command to install the required packages.
If you don't have pipenv installed
% brew install pipenv
Or
% pip install -U pip
% pip install pipenv
Please install with.
This tool requires an e-stat application ID, so complete user registration according to the User's Guide, register the application ID, and execute the following command to create a .env
file.
<YOUR_APP_ID>
.% touch /e_stat_api_sample/e_stat/.env
% echo "app_id=<YOUR_APP_ID>" >> /e_stat_api_sample/e_stat/.env
After creating the .env
file that stores the application ID, enter the following items with the merge-boundary
command and execute it.
-p, --pref_name TEXT Enter the prefecture name of the shp file to get[required]
-d, --download_dir TEXT Enter the path string of the directory that stores the shp file to download[required]
-a, --area TEXT Enter the standard area code of the statistical data to be acquired.[required]
-c, --class_code TEXT Enter the item of statistical data to be acquired[required]
-y, --year TEXT Enter the year of the statistical data to be acquired[required]
-st, --stats_table_id TEXT Enter the statistical table ID of the statistical data you want to acquire.[required]
-o, --output_dir TEXT Enter the path string of the directory that stores the downloaded csv[required]
Specifically, the command is as follows.
% pipenv run python -m e_stat merge-boundary \
-p Hokkaido\
-d ./download_file \
-a 01101 \
-c A1101 \
-y 2000 \
-st 0000020101 \
-o ./created
Not limited to merge-boundary
, shell scripts are prepared as samples for the 5 types of commands, so the following commands can be used to check the operation.
% bash merge_boundary.sh
If you try to execute bash merge_boundary.sh
, the following log will be displayed, and then merge_boundary.geojson
should be generated in the created
directory.
% bash merge_boundary.sh
0.00B [00:00, ?B/s]url='https://www.e-stat.go.jp/gis/statmap-search/data?dlserveyId=A002005212015&code=01&coordSys=1&format=shape&downloadType=5', res.status_code=200
A002005212015DDSWC01.Start downloading the zip
15.8MB [00:05, 2.75MB/s]
Import file:.../e_stat_api_tools/download_file/A002005212015DDSWC01.zip
.Convert shp file in zip file to gdf
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
super(GeoDataFrame, self).__setitem__(key, value)
Specify gdf key(AREA_CODE)Combine with.
Convert gdf to geojson.
.../e_stat_api_tools/created/boundary.I exported geojson.
.../e_stat_api_tools/created/boundary.I exported csv.
Get a statistical table. URL=http://api.e-stat.go.jp/rest/3.0/app/getSimpleStatsData?appId=9a10491cd87e8877b5410283228bb64b7805ff79&cdArea=01101&cdCat01=A1101&cdTime=2000100000&statsDataId=0000020101&lang=J&metaGetFlg=N&c&explanationGetFlg=N&annotationGetFlg=N§ionHeaderFlg=2
Convert gdf to geojson.
.../e_stat_api_tools/created/merge_boundary.I exported geojson.
.../e_stat_api_tools/created/merge_boundary.I exported csv.
Let's open the created merge_boundary.geojson
in QGIS!
01101
In other words, you can see that the following statistical data has been merged into the boundary data of Chuo-ku, Sapporo.
--A1101 (A population / total household population) --2000 (2000) --0000020101 (Social / Demographic System Municipal Data)
We have created and introduced a tool that allows you to easily merge e-Stat boundary data and statistical data. How was it?
Since it is an open source (MIT license) tool, we will continue to add and improve functions as needed, but if you have any problems or requests, we would appreciate it if you could contact us.
We are also good at visualizing and analyzing various data, creating tools and creating WebGIS, and we have received orders and developed functions that meet the diverse needs of many customers. If you are interested, please feel free to contact us from Our website.
We are also developing the map distribution service MapTiler for Japan. You can use high quality map data including vector tiles with overwhelmingly high cost performance compared to other companies. For more information, please visit MapTiler.jp!
Recommended Posts