A state of emergency was declared for the convergence of the new coronavirus infection (COVID-19), and although it was not as much as lockdown in other countries, various industries were closed, events were canceled, remote work, various self-restraints, etc. , Is affecting the lives of many people. The condition for an infectious disease such as corona to converge is that the effective reproduction number (the total number of secondary infections produced by one infected person) is persistently less than 1. In Previous article, the incubation period and infection period are one of the methods to calculate the effective reproduction number from the data of the number of infected persons in chronological order. I showed a simple method to calculate separately. In this article, what happens when the infectivity, that is, the strength of infecting others after infection, is incorporated into the effect that changes with time, ** especially how many days after infection the peak of infectivity * I would like to estimate about *. In particular, may it be infected even during the incubation period in the media? Since the suspicion has been reported, I think that it will also be a verification of its authenticity.
I would like to introduce three studies that are the premise of the examination of this article. The first is Survey by the Cluster Countermeasures Group of the Ministry of Health, Labor and Welfare. According to the cluster countermeasures group, the basic reproduction number (when it is basic, it does not consider environmental factors, and when it is effective, it seems that policies are also taken into consideration) is sampled as shown in the following graph. Thankfully, ** the average value is specified as 1.49 ** (important here). In other words, while most people do not infect (0 people), they can infect many people at once, such as 11 or 12. Therefore, R0 should be considered as a wide-tailed distribution. The second is [Research Report No.13](https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-13-europe] from Imperial College London, England. -npi-impact /). The university has published a report on COVID-19 early on and seems to have a strong influence on British policy. What should be noted in this report is that the infectivity is expressed using the gamma distribution as shown in the figure below. The average value is 6.5 [days], but the peak seems to be around the 4th to 5th days. The third is Report on the incubation period of COVID-19 of Johns Hopkins University in the United States. estimates-5-days-for-incubation-period.html). According to this, the median incubation period is 5.1 days **. This, combined with the graph from the second study, suggests that ** there may be a peak of infectivity before the onset date **. Based on the above three studies, we will examine the peak infectivity, especially using data on the number of infected people in Japan.
To calculate the basic reproduction number considering infectivity, consider a model based on the SEIR model, which is a slightly improved version of Previous article. I would like. See the following figure. In the previous article, it was stated that a secondary infection would occur from an infected person during the infection period (I) with the same infection intensity, but this time we will adopt the following assumptions.
In addition, the function of infection intensity of 3 adopts the binomial distribution. This is because it is easier to handle because it requires only one parameter compared to the gamma distribution. Now, the formula for the basic reproduction number based on the binomial distribution is determined as follows.
R_0(y) = r_0 \cdot {}_N C_y \cdot \theta^y (1-\theta)^{N-y}
$ r_0 $ is a constant, $ \ theta $ is a binary distribution parameter, $ y $ is the number of days since the date of infection, and $ N $ is the sum of the incubation period and the infection period (= 13). Secondary infection from an infected person to an infected person is defined by convolution as shown in the following equation. $ P (t) $ is the number of confirmed positives at time $ t $.
P(t+N) = \sum_{y=0}^N R_0(N-y) P(t+y)
Transforming these gives you the formula to calculate $ r_0 $.
r_0 = \frac{P(t+N)}{\sum_{y=0}^N {}_N C_{N-y} \cdot \theta^{N-y} (1-\theta)^{y} P(t+y)}
It is a little unnatural that both the denominator and the numerator have $ P (t + N) $, but the calculation includes the effect of causing a secondary infection immediately on the day of infection. Now, according to this $ R_0 (y) $, I would like to determine the infectivity parameter $ \ theta $, that is, the time when the infectivity is maximized $ N \ theta $, but I will go with the following strategy.
In step 4, if the distribution of the cluster group is numerical data, a method such as calculating and optimizing the distance between the probability distributions could be considered, but since there is only data in the image ** I gave up…. I want you to publish it on csv. (Chilla | д ゚)
Map of the number of new coronavirus infections by prefecture (provided by Jag Japan Co., Ltd.), we have used the publicly available csv data. It was. I think it is wonderful that private companies are volunteering to collect and publish such data.
This time, I would like to see the results at the same time while calculating.
Set the library and font.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import datetime
import locale
#
from scipy.special import comb
font = {'family' : 'IPAexGothic'}
plt.rc('font', **font)
It is a function that counts the number of infected people by prefecture from the data of Jag Japan.
def readCsvOfJapanPref(pref : None):
#Download from the URL below
# https://jag-japan.com/covid19map-readme/
fcsv = u'COVID-19.csv'
df = pd.read_csv(fcsv, header=0, encoding='utf8', parse_dates=[u'Fixed date'])
#Fixed date,Extract only the prefectures where you receive a medical examination
df1 = df.loc[:,[u'Fixed date',u'Consultation prefecture']]
df1.columns = ['date','pref']
#Extracted by consultation prefecture
if pref is None or pref == u'In Japan':
df1 = df1
else:
df1 = df1[df1.pref == pref]
df1 = df1.loc[:,'date']
#Count on fixed date
df2 = pd.DataFrame( df1.value_counts() ).reset_index()
df2.columns = ['date','P']
df2 = df2.sort_values('date').reset_index(drop = True)
#
return df2
This function calculates the basic reproduction number in consideration of the infection intensity. We have put in zero percent measures.
def binomi(y, N, th):
c = comb(N, y)
p = c * pow(th, y) * pow(1. - th, N - y)
return p
def calcR0(df, keys):
#Binomial distribution
N = keys['N']
th = keys['th']
bcoef = [binomi(y, N, th) for y in range(N+1) ]
#
nrow = len(df)
getP = lambda s: df.loc[s, 'P' ] if (s >= 0 and s < nrow) else np.NaN
for t in range(1, nrow):
df.loc[t, 'Ppre' ] = sum([ getP(t + y) * bcoef[N - y] for y in range(0, N + 1)])
df.loc[t, 'Pat' ] = getP(t + N )
if df.loc[t, 'Ppre'] > 0.1: #Zero percent measures
df.loc[t, 'R0' ] = df.loc[t, 'Pat'] / df.loc[t, 'Ppre']
else:
df.loc[t, 'R0' ] = np.NaN
return df
This function calculates the basic reproduction number for each prefecture.
def makeCalcFrame(days):
t_1 = pd.Timestamp(2020,1,24) #Calculation start date
td = pd.Timedelta('1 days')
#
npd = [[t_1 + td * i, 0, 0, 0 ] for i in range(0,days)]
df1 = pd.DataFrame(npd)
df1.columns = ['date', 'Ppre','Pat', 'R0']
#
return df1
def mergeCalcFrame(df1, df2):
return pd.merge(df1, df2, on='date', how='left').fillna(0)
def R0inJapanPref(pref, keys):
#
df1 = makeCalcFrame(100)
df2 = readCsvOfJapanPref(pref)
df = mergeCalcFrame(df1, df2)
#
df = calcR0(df, keys)
#
return df
Now, let's calculate the basic reproduction number for each parameter $ \ theta $ included in the infection intensity $ R_0 (y) $ by prefecture and see its distribution. A function for calculation. The candidates for the parameter $ \ theta $ will be calculated on the grid.
def calcDfMean(dflist):
mlist = []
for df, pref in dflist:
R0list = df[(df.R0 >= 0)]['R0']
m = np.mean(R0list)
mlist.append(m)
return np.mean(mlist)
def calcThToDfMean():
preflist = [u'In Japan', u'Tokyo', u'Osaka', u'Aichi prefecture', u'Fukuoka Prefecture', u'Hokkaido']
keys = {'N':13}
dc = {}
for y in np.linspace(0, 13, 14):
keys['th'] = y / keys['N']
dflist = [[R0inJapanPref(pref, keys), pref] for pref in preflist]
dm = calcDfMean(dflist)
dc[y] = dm
#
return dc
This is the part to calculate and display. It will take a few minutes.
dc = calcThToDfMean()
fig = plt.figure(figsize=(5,5))
ax1 = fig.add_subplot(1,1,1)
ax1.plot(list(dc.keys()), list(dc.values()))
ax1.set_xlabel('Nθ=E[y]')
ax1.set_ylabel('E[R0(y)]')
ax1.set_xticks(np.linspace(0,13,14))
ax1.set_xlim(0,)
ax1.set_ylim(0,)
ax1.grid()
ax1.set_title('The peak day vs the expected R0')
plt.show()
Let's see the calculation result. The horizontal axis represents $ N \ theta $, that is, the mean value of the binomial distribution, and the vertical axis represents the mean value of the basic reproduction number $ r_0 $. In the graph of the ** cluster countermeasure group earlier, the average number of basic reproductions was 1.49. ** ** From this graph, if you look at the intersection position where the vertical axis is 1.49, you can see that $ N \ theta = 5 $. Therefore, let's set the parameter $ \ theta $ to an estimated value of $ 5 / N = 0.385 $. ** **
Next, let's look at the distribution of basic reproduction numbers when the estimated parameter $ \ theta = 0.385 $ is used. A function for calculating the distribution.
def makeR0frame(dflist):
dc = {}
for df, pref in dflist:
R0list = df[(df.R0 >= 0)]['R0']
dc[pref] = R0list
return pd.DataFrame(dc)
def showR0bar(ax, df):
color = dict(boxes="Gray",whiskers="DarkOrange",medians="Black",caps="Gray")
df.plot.box(color=color, ax = ax)
def showBCoef(ax, keys):
N = keys['N']
th = keys['th']
ylist = [y for y in range(N+1) ]
bcoef = [binomi(y, N, th) for y in ylist ]
ax.plot(ylist, bcoef)
ax.set_xticks(np.linspace(0,N,N+1))
ax.grid()
ax.set_ylim(0,)
ax.set_xlim(0,)
This is the part that calculates and displays the distribution.
preflist = [u'In Japan', u'Tokyo', u'Osaka', u'Aichi prefecture', u'Fukuoka Prefecture', u'Hokkaido']
keys = {'N':13, 'th':5./13. }
dflist = [[R0inJapanPref(pref, keys), pref] for pref in preflist]
fig = plt.figure(figsize=(10,5))
plt.rcParams["font.size"] = 10
ax1 = fig.add_subplot(1,2,1)
ax2 = fig.add_subplot(1,2,2)
#
df1 = makeR0frame(dflist)
showR0bar(ax1, df1)
showBCoef(ax2, keys)
ax1.legend()
ax1.set_title("R0")
ax2.set_xlabel('Nθ=E[y]')
ax2.set_ylabel('P(y;θ)')
ax2.set_title('binomi(y;θ)')
plt.show()
Let's see the calculation result. The left is a boxplot of the distribution of basic reproduction numbers by prefecture, and the right is the binomial distribution of infection intensity. Looking at the figure on the left, you can see that the distribution has a fairly long tail. In addition, the maximum outlier is around 17.5 in Hokkaido. As you can see by changing the parameters, it seems that there are few differences between prefectures with this parameter. Looking at the figure on the right, I feel that it is similar to the gamma distribution in the report of Imperial College London. The peak position is on the 5th day, so this is a little behind.
Finally, make a histogram of the basic reproduction number and match the answer with the histogram of the cluster countermeasure team. The code to calculate.
def showR0hist(ax, df, pref):
R0list = df[df.R0 >= 0]['R0']
R0list = R0list / R0list.sum() * 100
R0list.hist( bins=30, label=pref, ax=ax)
fig = plt.figure(figsize=(5,5))
ax1 = fig.add_subplot(1,1,1)
for df, pref in dflist:
showR0hist(ax1, df, pref)
ax1.legend()
#
plt.show()
Let's see the calculation result. The left is R0 estimated from the data of the number of infected people, and the right is R0 aggregated by the survey of the cluster countermeasure team. The shape is pretty good. It is a shape that can be approximated by an exponential distribution. The average values are the same, so Yoshi! Let's say.
From the above, the following trends can be derived from the calculation results regarding the estimation of the peak infectivity based on the data of infected persons in each prefecture in Japan.
I referred to the following page.