In relation to the new coronavirus infection (COVID-19), the effective reproduction number [^ 1] is analyzed by prefecture, and [ranking] ](Https://qiita.com/oki_mebarun/items/a21dd3ebf03c64066d29), but this time, we will look at the world data and consider whether the convergence time can be predicted from the transition of the effective reproduction number. I saw it. In particular, since the Tokyo Olympics 2020 is on the verge of being held in July, it is hoped that the situation will be resolved as soon as possible not only in Japan but also in the world.
First of all, to briefly explain from the conclusion,
The basic calculation formula is the same as the content of Previous article. I haven't changed the parameters either. In addition, we will use the New Coronavirus Dataset. I did. We pay tribute to the efforts provided with such public data. Due to the time lag of finding a positive test after the incubation period and infection period, the results before the last two weeks have not been obtained.
This code is available on GitHub. It is saved in Jupyter Notebook format. (File name: 03_R0_estimation-WLD-02b.ipynb)
In particular, there are not many changes because it follows the previous article. To put it bluntly, the difference is taken to change the cumulative value data into daily fixed data.
def readCsvOfWorldArea(area : None):
#Download from the URL below
# https://hackmd.io/@covid19-kenmo/dataset/https%3A%2F%2Fhackmd.io%2F%40covid19-kenmo%2Fdataset
fcsv = u'World-COVID-19.csv'
df = pd.read_csv(fcsv, header=0, encoding='sjis', parse_dates=[u'date'])
#date,Extract target countries
if area is not None:
df1 = df.loc[:,[u'date',area]]
else:
df1 = df.loc[:,[u'date',u'Infected people throughout the world']]
df1.columns = ['date','Psum']
##Cumulative ⇒ daily conversion
df2 = df1.copy()
df2.columns = ['date','P']
df2.iloc[0,1] = 0
##Character string ⇒ numerical value
getFloat = lambda e: float('{}'.format(e).replace(',',''))
##Difference calculation
for i in range(1,len(df1)):
df2.iloc[i, 1] = getFloat(df1.iloc[i, 1]) - getFloat(df1.iloc[i-1, 1] )
##
return df2
A moving average has been added to the R calculation process. The average is taken for 3 days before and after.
def calcR0(df, keys):
lp = keys['lp']
ip = keys['ip']
nrow = len(df)
getP = lambda s: df.loc[s, 'P'] if s < nrow else np.NaN
getP2 = lambda s: np.average([ getP(s + r) for r in range(-1,2)])
for t in range(1, nrow):
df.loc[t, 'Ppre'] = sum([ getP2(s) for s in range(t+1, t + ip + 1)])
df.loc[t, 'Pat' ] = getP2(t + lp + ip)
if df.loc[t, 'Ppre'] > 0:
df.loc[t, 'R0' ] = ip * df.loc[t, 'Pat'] / df.loc[t, 'Ppre']
else:
df.loc[t, 'R0' ] = np.NaN
return df
Also, to make the axes easier to see, they are displayed on logarithmic axes.
def showResult3(dflist, title):
# R0=1
dfs = dflist[0][0]
ptgt = pd.DataFrame([[dfs.iloc[0,0],1],[dfs.iloc[len(dfs)-1,0],1]])
ptgt.columns = ['date','target']
ax = ptgt.plot(title='COVID-19 R0', x='date',y='target',style='r--', figsize=(10,8))
ax.set_yscale("symlog", linthreshy=1)
#
for df, label in dflist:
showResult2(ax, df, label)
#
ax.grid(True)
ax.set_ylim(0,)
plt.show()
fig = ax.get_figure()
fig.savefig("R0_{}.png ".format(title))
I was able to handle it without changing the original code so much, which was helpful.
Now let's take a look at the calculation results. If $ R_0> 1 $, the infection is spreading, and if $ R_0 <1 $, the infection is converging.
Here are the results for mainland China, Italy, the United States, Spain, Iran and South Korea.
Here is the result of collecting countries with many infected people in Europe including Italy.
Here are the results for Taiwan, Japan, Hong Kong and Singapore.
Looking at the graph, if not all, here is the result of collecting countries where $ R $ is moving at a high level and there is no tendency to converge.
Looking at the changes in the number of effective reproductions, we can see that after a sharp increase, it tends to decrease exponentially. In particular, looking at the results in Europe, we see a similar convergence trend regardless of country. Therefore, I applied it with the following approximation formula.
R(t) = R(t_0) \cdot 2^{-\frac{t-t_0}{T}}
In other words, the half-life of $ R (t) $ is $ T $. In fact, if you set $ T = 7.5 [days] $ and match it with the graph of the European region, it will be as follows (the dotted line in the figure is the estimation formula).
From here, if you specifically substitute the date for $ R (t) $,
The result is. Of course, it is an approximation, so it may not be the case. However, if $ R <1 $ was reached on March 21st, a trend that the increase in new infections would be stable should be observed around April 4th, 13 days later. .. If so, the number of inpatients will decrease steadily and convergence will be seen.
Also, here is the result of applying the above approximation formula to other regions.
I referred to the following page.
[^ 1]: In this article, we define it as the number of secondary infections by one infected person (at a certain time t, under certain measures).