Next time https://qiita.com/Naoya_Study/items/851f4032fb6e2a5cd5ed
As the coronavirus infection spreads, various organizations have released cool dashboards that visualize the infection status.
Example 1 WHO Novel Coronavirus (COVID-19) Situation
Example 2 Ministry of Health, Labor and Welfare New Coronavirus Infection Domestic Case
Example 3 Toyo Keizai ONLINE New Coronavirus Domestic Infection Status
It is cool! I want to be able to make something like this myself. The ultimate goal is to use Python's visualization-specific dataframe Dash to create a dashboard like the example above. This time, as a preliminary preparation, I would like to draw using the visualization library Plotly. Please forgive the code mess.
We will use the infectious disease data published by Toyo Keizai Online in Japan. https://github.com/kaz-ogiwara/covid19/
import requests
import io
import pandas as pd
import re
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime as dt
url = 'https://raw.githubusercontent.com/kaz-ogiwara/covid19/master/data/individuals.csv'
res = requests.get(url).content
df = pd.read_csv(io.StringIO(res.decode('utf-8')), header=0, index_col=0)
The data is in this format.
New No. | Old No. | Confirmed year | Confirmed month | Fixed date | Age | sex | Place of residence 1 | Place of residence 2 |
---|---|---|---|---|---|---|---|---|
1 | 1 | 2020 | 1 | 15 | 30s | Man | Kanagawa Prefecture | |
2 | 2 | 2020 | 1 | 24 | Forties | Man | China (Wuhan City) | |
3 | 3 | 2020 | 1 | 25 | 30s | woman | China (Wuhan City) | |
4 | 4 | 2020 | 1 | 26 | Forties | Man | China (Wuhan City) | |
5 | 5 | 2020 | 1 | 28 | Forties | Man | China (Wuhan City) | |
6 | 6 | 2020 | 1 | 28 | 60s | Man | Nara Prefecture |
As you can see, the data for people living in China is also included, but this time it will be limited to Japan, so it will be excluded.
def Get_Df():
url = 'https://raw.githubusercontent.com/kaz-ogiwara/covid19/master/data/individuals.csv'
res = requests.get(url).content
df = pd.read_csv(io.StringIO(res.decode('utf-8')), header=0, index_col=0)
pattern = r'China(...)'
df['China'] = np.nan
for i in range (1, len(df)+1):
if re.match(pattern, df['Place of residence 1'][i]):
df['China'][i] = "T"
else:
df['China'][i] = "F"
df = df[df["China"] != "T"].reset_index()
return df
Index. | New No. | Old No. | Confirmed year | Confirmed month | Fixed date | Age | sex | Place of residence 1 | Place of residence 2 | China |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 2020 | 1 | 15 | 30s | Man | Kanagawa Prefecture | NaN | F |
1 | 6 | 6 | 2020 | 1 | 28 | 60s | Man | Nara Prefecture | NaN | F |
2 | 8 | 8 | 2020 | 1 | 29 | Forties | woman | Osaka | NaN | F |
3 | 9 | 10 | 2020 | 1 | 30 | 50s | Man | Mie Prefecture | NaN | F |
4 | 11 | 12 | 2020 | 1 | 30 | 20's | woman | Kyoto | NaN | F |
def Graph_Pref():
df = Get_Df()
df_count_by_place = df.groupby('Place of residence 1').count().sort_values('China')
fig = px.bar(
df_count_by_place,
x="China",
y=df_count_by_place.index,
#By setting orientation to horizontal, it becomes a horizontal bar graph.
orientation='h',
width=800,
height=1000,
)
fig.update_layout(
title="Prefectures where infection has been reported",
xaxis_title="Number of infected people",
yaxis_title="",
#Just specify the template and the graph will be based on black.
template="plotly_dark",
)
fig.show()
Plotly will create interactive and fashionable diagrams on your own.
Next, I would like to plot the number of infected people by prefecture on a Japanese map as a scatter plot. To do so, first obtain the latitude / longitude information of the prefectural capital of each prefecture and combine it with the csv data of Toyo Keizai Online. Prefectural office location The latitude / longitude data used was from Everyone's Knowledge A little Convenience Book. Extract only the required latitude and longitude data and merge using pandas merge.
def Df_Merge():
df = Get_Df()
df_count_by_place = df.groupby('Place of residence 1').count().sort_values('China')
df_latlon = pd.read_excel("https://www.benricho.org/chimei/latlng_data.xls", header=4)
df_latlon = df_latlon.drop(df_latlon.columns[[0,2,3,4,7]], axis=1).rename(columns={'Unnamed: 1': 'Place of residence 1'})
df_latlon = df_latlon.head(47)
df_merge = pd.merge(df_count_by_place, df_latlon, on='Place of residence 1')
return df_merge
index | Place of residence 1 | New No. | Old No. | Confirmed year | Confirmed month | Fixed date | Age | sex | Place of residence 2 | China | latitude | longitude |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Gifu Prefecture | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 35.39111 | 136.72222 |
1 | Ehime Prefecture | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 33.84167 | 132.76611 |
2 | Hiroshima Prefecture | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 34.39639 | 132.45944 |
3 | Saga Prefecture | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 33.24944 | 130.29889 |
4 | Akita | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 39.71861 | 140.10250 |
5 | Yamaguchi Prefecture | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 34.18583 | 131.47139 |
Plot on the map using the above data frame.
def Graph_JapMap():
df_merge = Df_Merge()
df_merge['text'] = np.nan
for i in range (len(df_merge)):
df_merge['text'][i] = df_merge['Place of residence 1'][i] + ' : ' + str(df_merge['China'][i]) + 'Man'
fig = go.Figure(data=go.Scattergeo(
lat = df_merge["latitude"],
lon = df_merge["longitude"],
mode = 'markers',
marker = dict(
color = 'red',
size = df_merge['China']/5+6,
opacity = 0.8,
reversescale = True,
autocolorscale = False
),
hovertext = df_merge['text'],
hoverinfo="text",
))
fig.update_layout(
width=700,
height=500,
template="plotly_dark",
title={
'text': "Infected person distribution",
'font':{
'size':25
},
'y':0.9,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'},
margin = {
'b':3,
'l':3,
'r':3,
't':3
},
geo = dict(
resolution = 50,
landcolor = 'rgb(204, 204, 204)',
coastlinewidth = 1,
lataxis = dict(
range = [28, 47],
),
lonaxis = dict(
range = [125, 150],
),
)
)
fig.show()
This is an image, but if you do it online, hover over the plot to see the specific number of infected people and it's cool. Please, try it.
Next is a bar graph of changes in the number of infected people. As before, first transform the data with pandas.
def Df_Count_by_Date():
df = Get_Df()
df['date'] = np.nan
for i in range (len(df)):
tstr = "2020-" + str(df['Confirmed month'][i]) + "-" + str(df['Fixed date'][i])
tdatetime = dt.strptime(tstr, '%Y-%m-%d')
df['date'][i] = tdatetime
df_count_by_date = df.groupby("date").count()
df_count_by_date["total"] = np.nan
df_count_by_date['gap'] = np.nan
df_count_by_date["total"][0] = df_count_by_date["China"][0]
df_count_by_date["gap"][0] = 0
for i in range (1, len(df_count_by_date)):
df_count_by_date["total"][i] = df_count_by_date['total'][i-1] + df_count_by_date['China'][i]
df_count_by_date['gap'][i] = df_count_by_date['total'][i] - df_count_by_date['China'][i]
df_count_by_date['total'] = df_count_by_date['total'].astype('int')
df_count_by_date['gap'] = df_count_by_date['gap'].astype('int')
return df_count_by_date
def Graph_total():
df_count_by_date = Df_Count_by_Date()
fig = go.Figure(data=[
go.Bar(
name='Cumulative number up to the previous day',
x=df_count_by_date.index,
y=df_count_by_date['gap'],
),
go.Bar(
name='New number',
x=df_count_by_date.index,
y=df_count_by_date['China']
)
])
# Change the bar mode
fig.update_layout(
barmode='stack',
template="plotly_dark",
title={
'text': "Changes in the number of patients",
'font':{
'size':25
},
'y':0.9,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'
},
xaxis_title="Date",
yaxis_title="Number of infected people",
)
fig.show()
Plotly's scattergeo recognizes the country with a 3-digit ISO code, so borrow the country code from the net and merge it with pandas.
INDEX | COUNTRY | Confirmed | Deaths | ISO CODES | code | size |
---|---|---|---|---|---|---|
0 | China | 81049 | 3230 | CN / CHN | CHN | 82049.0 |
1 | Italy | 27980 | 2158 | IT / ITA | ITA | 28980.0 |
2 | Iran | 14991 | 853 | IR / IRN | IRN | 15991.0 |
3 | South Korea | 8236 | 75 | KR / KOR | KOR | 9236.0 |
4 | Spain | 7948 | 342 | ES / ESP | ESP | 8948.0 |
fig = px.scatter_geo(
df_globe_merge,
locations="code",
color='Deaths',
hover_name="COUNTRY",
size="size",
projection="natural earth"
)
fig.update_layout(
width=700,
height=500,
template="plotly_dark",
title={
'text': "Infected person distribution",
'font':{
'size':25
},
'y':0.9,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'},
geo = dict(
resolution = 50,
landcolor = 'rgb(204, 204, 204)',
coastlinewidth = 1,
),
margin = {
'b':3,
'l':3,
'r':3,
't':3
})
fig.show()
You can also fill it.
fig = px.choropleth(
df_globe_merge,
locations="code",
color='Confirmed',
hover_name="COUNTRY",
color_continuous_scale=px.colors.sequential.GnBu
)
fig.update_layout(
width=700,
height=500,
template="plotly_dark",
title={
'text': "Infected person distribution",
'font':{
'size':25
},
'y':0.9,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'},
geo = dict(
resolution = 50,
landcolor = 'rgb(204, 204, 204)',
coastlinewidth = 0.1,
),
margin = {
'b':3,
'l':3,
'r':3,
't':3
}
)
fig.show()
The color scale is It changes with GnBU of color_continuous_scale = px.colors.sequential.GnBu. Color list https://plot.ly/python/builtin-colorscales/
I was rewriting for Dash, but visualization with plotly.express didn't work, so I also made a drawing using plotly.graph_object.
fig = go.Figure(
data=go.Choropleth(
locations = df_globe_merge['code'],
z = df_globe_merge['Confirmed'],
text = df_globe_merge['COUNTRY'],
colorscale = 'Plasma',
marker_line_color='darkgray',
marker_line_width=0.5,
colorbar_title = 'Number of infected people',
)
)
fig.update_layout(
template="plotly_dark",
width=700,
height=500,
title={
'text': "Infected person distribution",
'font':{
'size':25
},
'y':0.9,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'},
geo=dict(
projection_type='equirectangular'
)
)
fig.show()
It looks almost the same except that the color scale is changed from GnBu to Plasma.
When data transformation and visualization are ready, I would like to reflect these in Dash (next time)
Recommended Posts