I have summarized the data of people infected with the new coronavirus in Ichikawa City, Chiba Prefecture, where I live.
In the first place, Ichikawa City Homepage does not disclose information in a format that can be secondarily used as open data. It's not a lot of data, there are few items, and it's not enough to try something with this, but it seems that it can be used for small things, so I tried to make it easy to use. I also posted the sample code (Python).
It is updated from time to time, but it may be delayed due to personal reasons.
[2020/05/08] Added death date
URL https://github.com/mine820/COVID-19
In CSV format, the character code is UTF-8.
The meanings of the columns are as follows.
--Classification --Patient (already affected) or asymptomatic pathogen carrier (not yet developed) --City --The order in which infections were found among residents of Ichikawa City. --Prefecture --The order in which infections were found in Chiba residents.
Sample code for analysis using data. The file is a Jupyter Notebook.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv('corona.csv')
df["Date of onset"] = df["Date of onset"].replace("unknown", "")
df["Date of onset"] = df["Date of onset"].replace("investigating", "")
df["Date of onset"] = pd.to_datetime(df["Date of onset"], format="%Y-%m-%d")
df["Inspection confirmation date"] = df["Inspection confirmation date"].replace("unknown", "")
df["Inspection confirmation date"] = df["Inspection confirmation date"].replace("investigating", "")
df["Inspection confirmation date"] = pd.to_datetime(df["Inspection confirmation date"], format="%Y-%m-%d")
df["Date of death"] = df["Date of death"].replace("unknown", "")
df["Date of death"] = df["Date of death"].replace("investigating", "")
df["Date of death"] = pd.to_datetime(df["Date of death"], format="%Y-%m-%d")
#Summary statistics
df.describe().loc[:,"Year"]
#Histogram (age)
plt.title("Age")
plt.yticks([0,5,10,15,20])
plt.hist(df["Year"], range=(0, 100));
#Inspection confirmation date + moving average (7 days)
days = (df["Inspection confirmation date"].max()-df["Inspection confirmation date"].min()).days
hist = plt.hist(df["Inspection confirmation date"], bins=days)
left = np.array(range(days))
num = 7
b = np.ones(num) / num
y2 = np.convolve(hist[0], b, mode='same')
plt.title("Inspection confirmation date")
plt.bar(left, hist[0], color='green');
plt.plot(y2, color='red')
Recommended Posts