New Corona ... The spread of infection doesn't stop ...
As a fledgling GIS shop, I wanted to put some kind of data on a map and visualize it, but most of the publicly available data was a PDF, so I was afraid to collect it.
But! !! !! !! !! !!
Sites that provide excellent data by freely copying, quoting, and reprinting data for non-commercial purposes ([https://gis.jag-japan.com/covid19jp/](https://gis. I found jag-japan.com/covid19jp/)), so I would like to use this data to express it on a map! !! xf
This time, after studying, it has a slightly redundant configuration (such as launching an application server and DB server with docker-compose and distributing GeoJSON with API).
Let's take a look at the data immediately!
Go to the site above and click on the csv
link in the upper left.
Then you can download the following csv.
COVID-19.csv
Through,Ministry of Health, Labor and Welfare NO,Asymptomatic pathogen carrier,Domestic,Charter flights,Age,sex,Fixed date,Date of onset,Consultation prefecture,Prefecture of residence,In the residential jurisdiction,Residential city,Key,Presentation,Prefectural case number,status,Remarks,Source,Source2,Source3,Number of people,Cumulative,The day before ratio,Total dead,Number of dischargesCumulative,Number of discharges,Number of cases,PCR検査実施Number of people,PCR検査The day before ratio,Profession_For correctness confirmation,Work place_For correctness confirmation,Hospital Pref,Residential Pref,Release,Gender,X,Y,Fixed dateYYYYMMDD,Consultation prefectureコード,Prefecture of residenceコード,Update date and time,Field2,Field4,Field5,Field6,Field7,Field8,Field9,Field10
1 ,1 ,,A-1,,30 ,male,1/15/2020,1/3/2020,Kanagawa Prefecture,Kanagawa Prefecture,,,Kanagawa Prefecture,Kanagawa Prefecture,1,Discharge,,https://www.mhlw.go.jp/stf/newpage_08906.html,https://www.pref.kanagawa.jp/docs/ga4/bukanshi/occurrence.html,,1 ,1 ,1 ,0 ,1 ,1 ,0 ,,,,,Kanagawa,Kanagawa,Kanagawa Prefecture,Male,139.642347,35.447504,2020/1/15,14,14,3/29/2020 18:50,,,,,,,,
2 ,2 ,,A-2,,40 ,male,1/24/2020,1/14/2020,Tokyo,People's Republic of China,,,People's Republic of China,Tokyo,1,Discharge,,https://www.mhlw.go.jp/stf/newpage_09079.html,https://www.metro.tokyo.lg.jp/tosei/hodohappyo/press/2020/01/24/20.html,,1 ,2 ,1 ,0 ,,,2 ,,,,,Tokyo,China(Mainland),Tokyo Metropolitan Government,Male,116.409685,39.903832,2020/1/24,13,NA,,,,,,,,,
csv is the best.
The number of infected people = the number of records, and they provide quite detailed data, but I can't handle it, so let's focus on the data you want to use!
Let's create a directory for work and a directory for data processing work for the time being! (The directory name (covid_sample this time) is arbitrary)
You can create the covid_sample directory and the script directory under it at once with the following command.
$mkdir -p covid_sample/script
This time, I just want to display the data on the map, so [" Through "," Age "," Gender "," Confirmation date "," Onset date "," Prefecture of consultation "," Prefecture of residence "," Let's take out only around X "," Y "]
!
I don't want to pollute the environment of the main machine, so I create a virtual environment using pipenv
etc. and edit it quickly using pandas.
I think the article here will be very helpful for how to use pipenv!
If you have pipenv installed, move the script directory with $ cd covid_sample / script
and create a Pipfile like the one below.
Pipfile
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true
[dev-packages]
[packages]
pandas = "==0.24.2"
requests = "*"
[requires]
python_version = "3.8"
The above file is a file that specifies how to create a virtual environment. In the above example, the Python version is 3.8 and pandas and requests are installed.
Create a virtual environment with the $ pipenv install
command and enter the virtual environment with the $ pipenv shell
.
Then, I think that a character like (script) will appear at the left end of the shell, but if this appears, it is in the virtual environment.
If you check the installed libraries with $ pip list
in this state, you can see that pandas and requests and their dependencies are installed.
(script) hogehoge:script$pip list
Package Version
--------------- ----------
certifi 2019.11.28
chardet 3.0.4
idna 2.9
numpy 1.18.2
pandas 0.24.2
pip 20.0.2
python-dateutil 2.8.1
pytz 2019.3
requests 2.23.0
setuptools 46.1.3
six 1.14.0
urllib3 1.25.8
wheel 0.34.2
This completes the virtual environment creation!
Let's create a Python script like the one below! I named it format.py
!
This script will download and process csv at once!
format.py
import pandas as pd
from datetime import datetime as dt
import requests
#csv url
csv_url = "https://dl.dropboxusercontent.com/s/6mztoeb6xf78g5w/COVID-19.csv"
try:
#Specify url and http request with GET method
r = requests.get(csv_url)
# COVID-19.Save as csv
with open("COVID-19.csv", 'wb') as f:
f.write(r.content)
except requests.exceptions.RequestException as err:
print(err)
#Specify the name of the column to be used
column_names = [
"Through", "Age", "sex", "Fixed date", "Date of onset",
"Consultation prefecture", "Prefecture of residence", "X", "Y",
]
#Change column name
changed_column_name = {
"Through": "id",
"Age": "age",
"sex": "gender",
"Fixed date": "fixed_data",
"Date of onset": "onset_data",
"Consultation prefecture": "consultation",
"Prefecture of residence": "prefectures",
"X": "lng",
"Y": "lat",
}
#Specify the array format of the column name used in usecols
df = pd.read_csv('COVID-19.csv', usecols=column_names)
#Specify the column name to be changed in dictionary format
rename_df = df.rename(columns=changed_column_name)
#Create a list by deleting the specified string from the age column
rename_df["age"] = [string.strip(' ') for string in list(rename_df["age"])]
#Convert time to date type
rename_df["fixed_data"] = [dt.strptime(data_string, "%m/%d/%Y").date() for data_string in list(rename_df["fixed_data"])]
# onset_Replace NaN in data
rename_df.fillna({'onset_data': '1/1/0001'}, inplace=True)
rename_df["onset_data"] = [dt.strptime(data_string, "%m/%d/%Y").date() for data_string in list(rename_df["onset_data"])]
#Replace the specified value in the specified column
# inplace=Change the original df with True
rename_df.replace(
{
"age": {"unknown": "999", "0-10": "10"}
}
, inplace=True)
#Export to csv
rename_df.to_csv("../docker/postgis/custom_COVID-19.csv", index=False)
When this command is executed, csv is downloaded and stored in the same directory as the script, and then the necessary data is extracted and processed and discharged to the specified directory.
The details are described in the comments of the code, so if you look there, you can understand it!
For the csv discharge location (covid_sample / fastAPI / docker / postgis / custom_COVID-19.csv
), specify the new directory created by building the docker environment explained in the item, and specify the same directory in the docker settings. Please be careful when executing the script!
Once you have finished processing the csv, press control + D
once to exit the virtual environment and use cd ../
to return to the root directory (covid_sample).
Let's convert the data to GeoJSON, a geospatial information format that is easy to handle on the web!
Any method can be used, but in the future I want to use the data as an API, so let's register it in the DB and start the API server using the backend!
In that case, we will use Docker, which is very convenient for personal server startup!
We will proceed on the assumption that it is installed, so if you are using mac, please refer to Docker Compose --Installation etc. with Docker Please install docker-compose. (If you google for other OS, it will come out soon!)
Once created, create a directory for the API as follows!
$mkdir -p docker/fastapi
$mkdir -p docker/postgis/init
$mkdir docker/postgis/sql
The current directory structure should look like this!
covid_sample
├── docker
│ ├── fastapi
│ └── postgis
│ ├── init
│ └── sql
└── script
├── COVID-19.csv
├── Pipfile
├── Pipfile.lock
└── format.py
In addition, there are quite a few files required to start docker, so they are described in a rough way below, so please copy and paste them! Lol
--Main configuration file for docker-compose: Set the container to be started and the connection setting between containers and the mount directory with the host machine.
docker/docker-compose.yml
version: '3.7'
services:
fastapi:
#Container name
container_name: fastapi
#Directory containing docker files to build
build: fastapi
volumes:
#Directory to mount
- ./fastapi:/usr/src/app/
ports:
#Host side port: Container side port
- 8000:8000
env_file:
#File to set in environment variable
- fastapi/.env
depends_on:
#Service to connect
- postgis
postgis:
container_name: postgis
build: postgis
volumes:
- covid_postgis_data:/var/lib/postgresql/data
# down -Specify the file to be executed at the first startup including when there is no volume with v etc.
- ./postgis/init:/docker-entrypoint-initdb.d
#Directory to mount
- ./postgis:/home
env_file: postgis/.env_db
ports:
#The port on the host side batting with the local psql, so it seems better than 5432
- 5433:5432
volumes:
covid_postgis_data:
--Environment variable configuration file for fastapi container
docker/fastapi/.env
DATABASE_HOST=localhost
DATABASE_PORT=5433
--Dockerfile for starting fastapi container (docker / fastapi / Dockerfile)
docker/fastapi/Dockerfile
FROM python:3.8
RUN apt-get update -y --fix-missing \
&& apt-get install -y -q --no-install-recommends
# install
RUN pip install pipenv
ADD Pipfile Pipfile.lock /
RUN pipenv install --system
# add to application
ADD app.py /
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
--Python module to be installed in fastapi container (docker / fastapi / Pipfile)
docker/fastapi/Pipfile
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true
[dev-packages]
[packages]
fastapi = "*"
uvicorn = "*"
psycopg2-binary = "*"
[requires]
python_version = "3.8.0"
--Application files for fastapi
docker/fastapi/app.py
from fastapi import FastAPI
app = FastAPI()
# `uvicorn app:app --reload`Start with
@app.get("/")
async def hello():
return {"text": "hello"}
--postgis container environment variable configuration file
Specify any password for docker / postgis / .env_db
.
docker/postgis/.env_db
#In postgres container, if you write in env, DB will be created automatically
POSTGRES_USER=covid_user
POSTGRES_PASSWORD=hogehoge
POSTGRES_DB=covid_db
--Dockerfile for starting postgis container (docker / postgis / Dockerfile)
docker/postgis/Dockerfile
FROM mdillon/postgis:9.6
# locale settings.
RUN localedef -i ja_JP -c -f UTF-8 -A /usr/share/locale/locale.alias ja_JP.UTF-8
ENV LANG ja_JP.UT
--A shell script loaded when the postgis container is started for the first time
docker/postgis/init/init.sh
#!/bin/sh
psql -U covid_user -d covid_db -f /home/sql/init.sql
psql -U covid_user -d covid_db -c "COPY covid_data FROM '/home/custom_COVID-19.csv' WITH CSV HEADER DELIMITER ',';"
psql -U covid_user -d covid_db -f /home/sql/create_geom.sql
--Sql file executed by the above shell script
docker/postgis/sql/init.sql
CREATE EXTENSION postgis;
create table covid_data
(
id smallint not null,
age int,
gender text,
fixed_data date,
onset_data date,
consultation text,
prefectures text,
lng float,
lat float
);
--Sql file executed by the above shell script
docker/postgis/sql/create_geom.sql
alter table covid_data
add geom geometry(point, 4326);
UPDATE covid_data SET geom = ST_GeogFromText('SRID=4326;POINT(' || lng || ' ' || lat || ')')::GEOMETRY;
Once you have the csv processed with the above files and scripts, build and start it with $ docker-compose up -d --build
!
This time we are launching a container for FastAPI
, a microweb framework for Python, and PostGIS
, a geospatial extension for PostgreSQL.
When the container is started, csv data is registered in the DB, and init.sh automatically generates a geometry type geom column from that data and inputs the data.
At this stage you should be able to see the data by connecting to the DB in the postgis container by some means (such as pgadmin or $ psql -U covid_user -d covid_db -h localhost -p 5433
).
Also, if you access localhost: 8000
from your browser and the display of{"text": "hello"}
is returned, the docker-compose environment construction is complete!
Once you have an environment, let's create an API!
Specifically, edit docker / fastapi / app.py
as follows!
docker/fastapi/app.py
from fastapi import FastAPI
import psycopg2
from starlette.middleware.cors import CORSMiddleware
app = FastAPI()
#Added for CORS settings
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"]
)
# `uvicorn app:app --reload`Start with
@app.get("/")
async def hello():
#Connect to postgreSQL
connection = psycopg2.connect(
# host="localhost",
host="postgis",
user="covid_user",
password="hogehoge",
dbname="covid_db",
port=5432
)
#Set the encoding of the client program (automatically convert from the DB character code)
connection.set_client_encoding('utf-8')
#Get cursor
cursor = connection.cursor()
#Generate geojson
sql = """
SELECT jsonb_build_object(
'type', 'FeatureCollection',
'features', jsonb_agg(features.feature)
)
FROM (
SELECT jsonb_build_object(
'type', 'Feature',
'id', id,
'geometry', ST_AsGeoJSON(geom)::jsonb,
'properties', to_jsonb(inputs) - 'gid' - 'geom'
) AS feature
FROM (SELECT * FROM covid_data) inputs) features;
"""
#SQL execution
cursor.execute(sql)
#Output acquisition result
results = cursor.fetchall()[0][0]
#Close the cursor
cursor.close()
#Disconnect
connection.close()
return results
In this application, first of all, it is necessary to access the API with Ajax from React to acquire and display data, so we are setting for CORS (this time, as a trial, all hosts and methods are allowed, but Please stop as much as possible because it is dangerous).
If you do not set it, you will not be able to access the API from the JavaScript side (map display application), so be sure to set it! (For more information, try google with CORS
!)
In the part of @ app.get ("/ ")
, it means that the processing when / is accessed by the GET method is written under it. Let's write the process here this time.
After that, access the DB using a module called psycopg2
for connecting to PostgreSQL from Python.
In the SQL part, the data is forcibly converted to GeoJSON by using the JSONB type function of PostgreSQL.
After rewriting, stop the containers with $ docker-compose down
and restart with $ docker-compose up -d --build
!
After booting is complete, try connecting to localhost: 8000
again.
If all goes well, you should get GeoJSON like this!
{
"type": "FeatureCollection",
"features": [
{
"id": 1,
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
139.642347,
35.447504
]
},
"properties": {
"id": 1,
"age": 30,
"lat": 35.447504,
"lng": 139.642347,
"gender": "male",
"fixed_data": "2020-01-15",
"onset_data": "2020-01-03",
"prefectures": "Kanagawa Prefecture",
"consultation": "Kanagawa Prefecture"
}
},
{
"id": 2,
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
116.409685,
39.903832
]
},
"properties": {
"id": 2,
"age": 40,
"lat": 39.903832,
"lng": 116.409685,
"gender": "male",
"fixed_data": "2020-01-24",
"onset_data": "2020-01-14",
"prefectures": "People's Republic of China",
"consultation": "Tokyo"
}
},
...
}
Now the data is ready! !!
Next time, let's display this data on the map!
→ Click here for more information: Create a new corona infection site map [FastAPI / PostGIS / deck.gl (React)](Data display)
Recommended Posts