I made a simple weekly time series map of coronavirus infection with plotly. The data was copied and pasted one by one from the pdf posted on the WHO site (it was difficult ...). Coronavirus disease (COVID-2019) situation reports
I'm afraid of how the number of Asian regions will increase. .. .. Also, although there are less than 100 people in Europe, the United States, and Canada, I'm also worried that they are popping up.
――As of February 21, 2020, there are about 140,000 infected people in China, but if the scale is the same, data from other countries will not be visible, so divide by 100 and scale down to 1000 people. ――Still, some single-digit and double-digit countries are still too small to be visualized, so we are scaling up by 10 times. ――The data for Japan includes the passengers of the Princess Diamond, so the number is about 600. If not included, there are 21 people as of February 21, 2020.
Environment: Google Colab Language: python
Visualize using plotly
. Since plotly
handles the country code (JPN for Japan, etc.) instead of the country name as it is, import the country_converter
to convert it to the country code. Data processing is done with pandas
.
#If not
!pip install plotly
!pip install country_converter
!pip install pandas
import country_converter as coco
import plotly.express as px
import pandas as pd
All of these are copied and hand-crafted from the pdf of the WHO site, so I'm sorry if there is a mistake. .. .. I wonder if there is a better database. .. ..
All of this data is stored in the DataFrame.
dict_01_22 = {"2020/01/22":
{"China": 310,
"Japan": 1,
"Republic of Korea": 1,
"Thailand": 2}}
dict_01_30 = {"2020/01/30":
{"China": 7737,
"Japan": 11,
"Republic of Korea": 4,
"Vietnam": 2,
"Singapore": 10,
"Australia": 7,
"Malaysia": 7,
"Cambodia": 1,
"Philippines": 1,
"Nepal": 1,
"Sri Lanka": 1,
"India": 1,
"United States of America": 5,
"Canada": 3,
"France": 5,
"Finland": 1,
"Germany": 4,
"United Arab Emirates": 4,
"Thailand": 14}}
dict_02_07 = {"2020/02/07":
{"China": 31211,
"Japan": 91,
"Republic of Korea": 24,
"Vietnam": 12,
"Singapore": 30,
"Australia": 15,
"Malaysia": 14,
"Cambodia": 1,
"Philippines": 3,
"Nepal": 1,
"Sri Lanka": 1,
"India": 3,
"United States of America": 12,
"Canada": 7,
"France": 6,
"Belgium": 1,
"Italy": 3,
"Finland": 1,
"Spain": 1,
"Sweden": 1,
"Germany": 13,
"The United Kingdom": 3,
"United Arab Emirates": 5,
"Russia": 2,
"Thailand": 25}}
dict_02_14 = {"2020/02/14":
{"China": 142823,
"Japan": 251,
"Republic of Korea": 28,
"Vietnam": 16,
"Singapore": 58,
"Australia": 15,
"Malaysia": 14,
"Cambodia": 1,
"Philippines": 3,
"Nepal": 1,
"Sri Lanka": 1,
"India": 3,
"United States of America": 15,
"Canada": 7,
"France": 11,
"Belgium": 1,
"Italy": 3,
"Finland": 1,
"Spain": 2,
"Sweden": 1,
"Germany": 16,
"The United Kingdom": 9,
"United Arab Emirates": 8,
"Russia": 2,
"Thailand": 33}}
dict_02_21 = {"2020/02/21":
{"China": 142823,
"Japan": 727,
"Republic of Korea": 204,
"Vietnam": 16,
"Singapore": 85,
"Australia": 17,
"Malaysia": 22,
"Cambodia": 1,
"Philippines": 3,
"Nepal": 1,
"Sri Lanka": 1,
"India": 3,
"United States of America": 15,
"Canada": 8,
"France": 12,
"Belgium": 1,
"Italy": 3,
"Finland": 1,
"Spain": 2,
"Sweden": 1,
"Germany": 16,
"The United Kingdom": 9,
"United Arab Emirates": 9,
"Iran": 5,
"Egypt": 1,
"Russia": 2,
"Thailand": 35}}
concated = pd.concat([
pd.DataFrame(dict_01_22),
pd.DataFrame(dict_01_30),
pd.DataFrame(dict_02_07),
pd.DataFrame(dict_02_14),
pd.DataFrame(dict_02_21)], axis=1, sort=True).fillna(0)
The first five lines of concated
look like this:
2020/01/22 | 2020/01/30 | 2020/02/07 | 2020/02/14 | 2020/02/21 | |
---|---|---|---|---|---|
Australia | 0.0 | 7.0 | 15.0 | 15.0 | 17 |
Belgium | 0.0 | 0.0 | 1.0 | 1.0 | 1 |
Cambodia | 0.0 | 1.0 | 1.0 | 1.0 | 1 |
Canada | 0.0 | 3.0 | 7.0 | 7.0 | 8 |
China | 310.0 | 7737.0 | 31211.0 | 142823.0 | 142823 |
Converting a country name to a country code and plotly will use tidy data, so use pd.melt to convert it.
time_periods = [column for column in concated.columns]
df = concated.reset_index().rename(columns={"index": "country"})
df["ISO"] = df["country"].apply(lambda x: coco.convert(x))
data = pd.melt(df, id_vars=["ISO"], value_vars=time_periods)
Here's what it looks like for data
converted to tidy data.
ISO | variable | value | |
---|---|---|---|
0 | AUS | 2020/01/22 | 0.0 |
1 | BEL | 2020/01/22 | 0.0 |
2 | KHM | 2020/01/22 | 0.0 |
3 | CAN | 2020/01/22 | 0.0 |
4 | CHN | 2020/01/22 | 310.0 |
I want to visualize the data here,
--Too many China --Some countries are too few
There is a problem, so adjust the scale there.
--China divides by 100 ――10 times more than Japan, China and South Korea
This made it easier to see on the map (I don't really know if it's ethical ...).
data_for_map = data
for ind in data[(data["ISO"] != "CHN") & (data["ISO"] != "JPN") & (data["ISO"] != "KOR")].index:
data_for_map.at[ind, "value"] = data_for_map.at[ind, "value"] * 10
for ind in data[data["ISO"] == "CHN"].index:
data_for_map.at[ind, "value"] = data_for_map.at[ind, "value"] // 100
fig = px.scatter_geo(data_for_map, locations="ISO",size="value",
animation_frame="variable",
projection="natural earth")
fig.show()
This should give you a map.
=======================Same as before=======================
time_periods = [column for column in concated.columns]
df = concated.reset_index().rename(columns={"index": "country"})
df["ISO"] = df["country"].apply(lambda x: coco.convert(x))
data = pd.melt(df, id_vars=["ISO"], value_vars=time_periods)
==========================================================
data_for_map = data[(data["ISO"] != "CHN") & (data["ISO"] != "JPN") & (data["ISO"] != "KOR")]
fig = px.scatter_geo(data_for_map, locations="ISO",size="value",
animation_frame="variable",
projection="natural earth")
fig.show()
If so, it is possible to exclude Japan, China, and South Korea, which have many infected people, and visualize them. In that case, the data for 2020/02/21
is as follows.
Southeast Asia, Europe and North America are especially noticeable.
I hope it converges as soon as possible. .. ..
Recommended Posts