Introduction

A state of emergency was declared for the convergence of the new coronavirus infection (COVID-19), and although it was not as much as lockdown in other countries, various industries were closed, events were canceled, remote work, various self-restraints, etc. , Is affecting the lives of many people. The condition for an infectious disease such as corona to converge is that the effective reproduction number (the total number of secondary infections produced by one infected person) is persistently less than 1. In Previous article, the incubation period and infection period are one of the methods to calculate the effective reproduction number from the data of the number of infected persons in chronological order. I showed a simple method to calculate separately. In this article, what happens when the infectivity, that is, the strength of infecting others after infection, is incorporated into the effect that changes with time, ** especially how many days after infection the peak of infectivity * I would like to estimate about *. In particular, may it be infected even during the incubation period in the media? Since the suspicion has been reported, I think that it will also be a verification of its authenticity.

Prerequisite knowledge

I would like to introduce three studies that are the premise of the examination of this article. The first is Survey by the Cluster Countermeasures Group of the Ministry of Health, Labor and Welfare. According to the cluster countermeasures group, the basic reproduction number (when it is basic, it does not consider environmental factors, and when it is effective, it seems that policies are also taken into consideration) is sampled as shown in the following graph. クラスター対策班のR0.jpg Thankfully, ** the average value is specified as 1.49 ** (important here). In other words, while most people do not infect (0 people), they can infect many people at once, such as 11 or 12. Therefore, R0 should be considered as a wide-tailed distribution. The second is [Research Report No.13](https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-13-europe] from Imperial College London, England. -npi-impact /). The university has published a report on COVID-19 early on and seems to have a strong influence on British policy. What should be noted in this report is that the infectivity is expressed using the gamma distribution as shown in the figure below. The average value is 6.5 [days], but the peak seems to be around the 4th to 5th days. The third is Report on the incubation period of COVID-19 of Johns Hopkins University in the United States. estimates-5-days-for-incubation-period.html). According to this, the median incubation period is 5.1 days **. This, combined with the graph from the second study, suggests that ** there may be a peak of infectivity before the onset date **. Based on the above three studies, we will examine the peak infectivity, especially using data on the number of infected people in Japan.

Computational model

To calculate the basic reproduction number considering infectivity, consider a model based on the SEIR model, which is a slightly improved version of Previous article. I would like. See the following figure. In the previous article, it was stated that a secondary infection would occur from an infected person during the infection period (I) with the same infection intensity, but this time we will adopt the following assumptions.

Infection can occur even during the incubation period (E).
Secondary infection results from the number of infected persons, which is the product of the number of infected persons and the intensity of infection.
Infection intensity is given by a function that depends on the number of days since the date of infection.
The sum of the incubation period (lp) and the infection period (ip) is 13.

In addition, the function of infection intensity of 3 adopts the binomial distribution. This is because it is easier to handle because it requires only one parameter compared to the gamma distribution. Now, the formula for the basic reproduction number based on the binomial distribution is determined as follows.

R_0(y) = r_0 \cdot {}_N C_y \cdot \theta^y (1-\theta)^{N-y}

$ r_0 $ is a constant, $ \ theta $ is a binary distribution parameter, $ y $ is the number of days since the date of infection, and $ N $ is the sum of the incubation period and the infection period (= 13). Secondary infection from an infected person to an infected person is defined by convolution as shown in the following equation. $ P (t) $ is the number of confirmed positives at time $ t $.

P(t+N) = \sum_{y=0}^N R_0(N-y) P(t+y)

Transforming these gives you the formula to calculate $ r_0 $.

r_0 = \frac{P(t+N)}{\sum_{y=0}^N {}_N C_{N-y} \cdot \theta^{N-y} (1-\theta)^{y} P(t+y)}

It is a little unnatural that both the denominator and the numerator have $ P (t + N) $, but the calculation includes the effect of causing a secondary infection immediately on the day of infection. Now, according to this $ R_0 (y) $, I would like to determine the infectivity parameter $ \ theta $, that is, the time when the infectivity is maximized $ N \ theta $, but I will go with the following strategy.

Determine $ \ theta $ appropriately.
Find $ r_0 $ from the formula of $ R_0 (y) $ and the actual data.
Find the mean of the distribution of $ r_0 $.
Repeat steps 1 to 3 and estimate $ \ theta $, where $ E [r_0; \ theta] $ is closest to 1.49 days, as the true value.

In step 4, if the distribution of the cluster group is numerical data, a method such as calculating and optimizing the distance between the probability distributions could be considered, but since there is only data in the image ** I gave up…. I want you to publish it on csv. (Chilla | д ﾟ)

Data to be analyzed

Map of the number of new coronavirus infections by prefecture (provided by Jag Japan Co., Ltd.), we have used the publicly available csv data. It was. I think it is wonderful that private companies are volunteering to collect and publish such data.

Try to calculate with Python & result

This time, I would like to see the results at the same time while calculating.

1. Preparation

Set the library and font.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import datetime
import locale
#
from scipy.special import comb

font = {'family' : 'IPAexGothic'}
plt.rc('font', **font)

It is a function that counts the number of infected people by prefecture from the data of Jag Japan.

def readCsvOfJapanPref(pref : None):
    #Download from the URL below
    # https://jag-japan.com/covid19map-readme/
    fcsv = u'COVID-19.csv'
    df = pd.read_csv(fcsv, header=0, encoding='utf8', parse_dates=[u'Fixed date'])
    #Fixed date,Extract only the prefectures where you receive a medical examination
    df1 = df.loc[:,[u'Fixed date',u'Consultation prefecture']]
    df1.columns = ['date','pref']
    #Extracted by consultation prefecture
    if pref is None or pref == u'In Japan':
        df1 = df1
    else:
        df1 = df1[df1.pref == pref]
    df1 = df1.loc[:,'date']
    #Count on fixed date
    df2 = pd.DataFrame( df1.value_counts() ).reset_index()
    df2.columns = ['date','P']
    df2 = df2.sort_values('date').reset_index(drop = True)
    #
    return df2

This function calculates the basic reproduction number in consideration of the infection intensity. We have put in zero percent measures.

def binomi(y, N, th):
    c = comb(N, y)
    p = c * pow(th, y) * pow(1. - th, N - y)
    return p

def calcR0(df, keys):
    #Binomial distribution
    N  = keys['N']
    th = keys['th']
    bcoef = [binomi(y, N, th) for y in range(N+1) ]
    #
    nrow = len(df)
    getP  = lambda s: df.loc[s, 'P' ] if (s >= 0 and s < nrow) else np.NaN
    for t in range(1, nrow):
        df.loc[t, 'Ppre' ] = sum([ getP(t + y) * bcoef[N - y] for y in range(0, N + 1)])
        df.loc[t, 'Pat'  ] = getP(t + N )
        if df.loc[t, 'Ppre'] > 0.1: #Zero percent measures
            df.loc[t, 'R0'  ] = df.loc[t, 'Pat'] / df.loc[t, 'Ppre']
        else:
            df.loc[t, 'R0'  ] = np.NaN
    return df

This function calculates the basic reproduction number for each prefecture.

def makeCalcFrame(days):
    t_1 = pd.Timestamp(2020,1,24) #Calculation start date
    td = pd.Timedelta('1 days')
    #
    npd = [[t_1 + td * i, 0, 0, 0 ] for i in range(0,days)]
    df1 = pd.DataFrame(npd)
    df1.columns = ['date', 'Ppre','Pat', 'R0']
    #
    return df1

def mergeCalcFrame(df1, df2):
    return pd.merge(df1, df2, on='date', how='left').fillna(0)

def R0inJapanPref(pref, keys):
    #
    df1 = makeCalcFrame(100)
    df2 = readCsvOfJapanPref(pref)
    df = mergeCalcFrame(df1, df2)
    #
    df = calcR0(df, keys)
    #
    return df

2. Relationship between the parameter θ of the binomial distribution and the average value of the basic reproduction number R0

Now, let's calculate the basic reproduction number for each parameter $ \ theta $ included in the infection intensity $ R_0 (y) $ by prefecture and see its distribution. A function for calculation. The candidates for the parameter $ \ theta $ will be calculated on the grid.

def calcDfMean(dflist):
    mlist = []
    for df, pref in dflist:
        R0list = df[(df.R0 >= 0)]['R0']
        m = np.mean(R0list)
        mlist.append(m)
    return np.mean(mlist)

def calcThToDfMean():
    preflist = [u'In Japan', u'Tokyo', u'Osaka', u'Aichi prefecture', u'Fukuoka Prefecture', u'Hokkaido']

    keys = {'N':13}
    dc = {}
    for y in np.linspace(0, 13, 14):
        keys['th'] = y / keys['N']
        dflist = [[R0inJapanPref(pref, keys), pref] for pref in preflist]
        dm = calcDfMean(dflist)
        dc[y] = dm
    #
    return dc

This is the part to calculate and display. It will take a few minutes.

dc = calcThToDfMean()

fig = plt.figure(figsize=(5,5))
ax1 = fig.add_subplot(1,1,1)
ax1.plot(list(dc.keys()), list(dc.values()))
ax1.set_xlabel('Nθ=E[y]')
ax1.set_ylabel('E[R0(y)]')
ax1.set_xticks(np.linspace(0,13,14))
ax1.set_xlim(0,)
ax1.set_ylim(0,)
ax1.grid()
ax1.set_title('The peak day vs the expected R0')
plt.show()

Let's see the calculation result. The horizontal axis represents $ N \ theta $, that is, the mean value of the binomial distribution, and the vertical axis represents the mean value of the basic reproduction number $ r_0 $. In the graph of the ** cluster countermeasure group earlier, the average number of basic reproductions was 1.49. ** ** From this graph, if you look at the intersection position where the vertical axis is 1.49, you can see that $ N \ theta = 5 $. Therefore, let's set the parameter $ \ theta $ to an estimated value of $ 5 / N = 0.385 $. ** **

3. Infection intensity and basic reproduction number when using the estimated parameter θ

Next, let's look at the distribution of basic reproduction numbers when the estimated parameter $ \ theta = 0.385 $ is used. A function for calculating the distribution.

def makeR0frame(dflist):
    dc = {}
    for df, pref in dflist:
        R0list = df[(df.R0 >= 0)]['R0']
        dc[pref] = R0list
    return pd.DataFrame(dc)

def showR0bar(ax, df):
    color = dict(boxes="Gray",whiskers="DarkOrange",medians="Black",caps="Gray")
    df.plot.box(color=color, ax = ax)

def showBCoef(ax, keys):
    N = keys['N']
    th = keys['th']
    ylist = [y for y in range(N+1) ]
    bcoef = [binomi(y, N, th) for y in ylist ]
    ax.plot(ylist, bcoef)
    ax.set_xticks(np.linspace(0,N,N+1))
    ax.grid()
    ax.set_ylim(0,)
    ax.set_xlim(0,)

This is the part that calculates and displays the distribution.

preflist = [u'In Japan', u'Tokyo', u'Osaka', u'Aichi prefecture', u'Fukuoka Prefecture', u'Hokkaido']

keys = {'N':13, 'th':5./13. } 
dflist = [[R0inJapanPref(pref, keys), pref] for pref in preflist]

fig = plt.figure(figsize=(10,5))
plt.rcParams["font.size"] = 10
ax1 = fig.add_subplot(1,2,1)
ax2 = fig.add_subplot(1,2,2)
#
df1 = makeR0frame(dflist)
showR0bar(ax1, df1)
showBCoef(ax2, keys)
ax1.legend()
ax1.set_title("R0")
ax2.set_xlabel('Nθ=E[y]')
ax2.set_ylabel('P(y;θ)')
ax2.set_title('binomi(y;θ)')
    
plt.show()

Let's see the calculation result. The left is a boxplot of the distribution of basic reproduction numbers by prefecture, and the right is the binomial distribution of infection intensity. Looking at the figure on the left, you can see that the distribution has a fairly long tail. In addition, the maximum outlier is around 17.5 in Hokkaido. As you can see by changing the parameters, it seems that there are few differences between prefectures with this parameter. Looking at the figure on the right, I feel that it is similar to the gamma distribution in the report of Imperial College London. The peak position is on the 5th day, so this is a little behind.

4. Match answers with a histogram of basic reproduction numbers

Finally, make a histogram of the basic reproduction number and match the answer with the histogram of the cluster countermeasure team. The code to calculate.

def showR0hist(ax, df, pref):
    R0list = df[df.R0 >= 0]['R0']
    R0list = R0list / R0list.sum() * 100
    R0list.hist( bins=30, label=pref, ax=ax)

fig = plt.figure(figsize=(5,5))
ax1 = fig.add_subplot(1,1,1)
for df, pref in dflist:
    showR0hist(ax1, df, pref)
ax1.legend()
#
plt.show()

Let's see the calculation result. R0_COVID-19_BinEst_R0allhist_比較用.png The left is R0 estimated from the data of the number of infected people, and the right is R0 aggregated by the survey of the cluster countermeasure team. The shape is pretty good. It is a shape that can be approximated by an exponential distribution. The average values are the same, so Yoshi! Let's say.

Consideration

From the above, the following trends can be derived from the calculation results regarding the estimation of the peak infectivity based on the data of infected persons in each prefecture in Japan.

** The peak infectivity is on the 5th day, that is, on the day of onset (with an incubation period of 5.1 days). ** ** From the shape of the + binomial distribution, it seems that the infectivity is strong ** from 2 days before the onset to 2 days after the onset.
The histogram of the basic reproduction number calculated from the data by prefecture with the peak of infectivity as the 5th day is the same as the histogram of the epidemiological survey of the cluster countermeasure group, and the shape is almost similar.

Furthermore ...

The result of strong infectivity on the day of onset is consistent with the report from Imperial College London. ** The fact that the data in Japan and the data in Europe have the same tendency may be reasonably reproducible **.
** I feel that it will be more efficient to track clusters from 2 days before the onset to 2 days after the onset.
It will be necessary to consider whether the binomial distribution is really good. For example, you may want to investigate the viral load of the affected person and compare it with data such as how the daily viral load changes.
In the calculation results, the larger the parameter $ \ theta $, the larger the mean and variance of the basic reproduction number $ r_0 $ tended to be. In other words, $ \ theta $ and $ r_0 $ are correlated, so it is important to identify them properly. In that respect, it is important that the aggregated distribution of the cluster countermeasure group could be used as the correct answer.

Reference link

I referred to the following page.

Estimate the peak infectivity of the new coronavirus