Introduction

The new coronavirus infection (COVID-19) is rampant all over the world, and urban blockades, so-called lockdowns, are being carried out in various regions such as China, Europe, the United States, and Asia. In Japan as well, as of April 4, there are 3360 infected people in Japan and 891 people in Tokyo, and the governor is requesting refraining from going out on weekends and avoiding specific food service businesses. In addition, it is spreading not only in central Tokyo and Osaka, but throughout the country, and it can be said that the situation is as close as possible to an emergency situation where an overshoot occurs. In order to avoid such an emergency, in Government Expert Meeting etc., 3 dense spaces (closed and dense) ・ We are actively calling to avoid (close). Then, how much should we avoid such three dense spaces? In this article, I would like to quantify ** the degree of avoidance of three dense spaces ** necessary to prevent the spread of infection ** as a numerical value.

3 dense space and basic reproduction number

The basic reproduction number is

Expected number of secondary infections that a typical infected person in a population will reproduce during the entire period of infection

Is defined as. In other words, it is the strength with which an infected person can infect other people with the new coronavirus. Also, if its strength is strong, it can infect many people at once, resulting in clusters. According to the National Cluster Map published by the Ministry of Health, Labor and Welfare, the following are listed as the elements that make up the cluster.

Location: Hospitals, welfare facilities, houseboats, live houses, gyms, pubs, exhibitions
Acts: Chorus, social gathering, sports school

What they have in common is probably the three dense spaces. In addition, the cluster countermeasure group estimates show the analysis results of basic reproduction numbers by environment.

This figure suggests that the frequency distribution of basic reproduction numbers differs depending on whether the source of infection is in a poorly ventilated environment or in other environments. In other words, ** 3 the probability distribution of the basic reproduction number changes greatly depending on whether or not it is a dense space **. Therefore, I would like to define the parameters that represent the differences in the environment as follows.

Environment parameter er: Probability that the source of infection is in a non-three dense space (good environment) (0-1)

Then, the code to approximate the basic reproduction number by the above parameter er is shown in Previous article. The above figure is approximated by combining conditional branching and uniform distribution.

"""
 er: good environment ratio (0.-1.)
"""
def COVID19R0(er):
    if np.random.rand() < er:
        # good environment
        if np.random.rand() < 0.8:
            R0 = 0
        else:
            R0 = np.random.randint(1,4)*1.0
    else:
        # bad environment
        R0 = np.random.randint(0,12)*1.0
    return R0

Then, what value of the environmental parameter er can reduce the basic reproduction number to less than 1 (in the direction of converging infection)? Below, I would like to consider using statistical tools.

Central limit theorem

What is the Central Limit Theorem?

Observations $ X_1, X_2, \ cdots, X_n $ are random variables that are independent of each other and all follow the same distribution.
For all i, $ E [X_i] = \ mu, V [X_i] = \ sigma $.

When, for the sample mean $ \ bar {X} _n $ of the observed values,

\frac{\bar{X}_n - \mu}{\left( \frac{\sigma}{\sqrt{n}} \right)} \sim N(0,1)\  ({\rm as}\ n \rightarrow \infty) \\
\bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i

Was to hold. What is important here is that there are no assumptions about the probability distribution of the population other than the existence of true values for the mean and variance. Therefore, this theorem can be applied to the probability distribution of the basic reproduction number.

Confidence interval estimation of population mean

So how do you estimate what the population mean of basic reproduction numbers will be for each environmental parameter er? According to statistics textbooks, an estimate of the population mean (confidence interval of $ 1- \ alpha $) when the population variance is unknown is given below.

\left[ \bar{X}_n - \frac{U}{\sqrt{n}} t_{\alpha / 2} (n-1),  \bar{X}_n + \frac{U}{\sqrt{n}} t_{\alpha / 2} (n-1) \right]\\
U^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X}_n)^2

Here, $ U $ is the unbiased sample variance, and $ t_ {\ alpha} (n-1) $ is the $ \ alpha / 2 $ point of the t distribution with $ n-1 $ degrees of freedom (the tail area of the probability distribution is $). The value of the random variable that is \ alpha / 2 $). Reading the textbook, this confidence interval seems to be derived from the combination of estimates based on the central limit theorem, with the population mean following a normal distribution and the population variance following a $ χ ^ 2 $ distribution.

Try to calculate with Python

Now, let's use Python to calculate the population mean and confidence interval of the basic reproduction number R0 depending on the environment parameter er.

First, import the library.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

This is the function for generating the basic reproduction number.

"""
 er: good environment ratio (0.-1.)
"""
def COVID19R0(er):
    if np.random.rand() < er:
        # good environment
        if np.random.rand() < 0.8:
            R0 = 0
        else:
            R0 = np.random.randint(1,4)*1.0
    else:
        # bad environment
        R0 = np.random.randint(0,12)*1.0
    return R0

A function that calculates the sample mean and unbiased variance by sampling N pieces according to the given probability distribution $ f $.

def EstFromNsample(f, N):
    #N samplings according to probability distribution f
    sample = [f() for i in range(N)]
    #Specimen average
    Eave = np.average(sample)
    #Unbiased sample variance
    Esig2 = 1./(np.float(N)-1.) * np.sum([pow(s - Eave, 2) for s in sample]) 
    return Eave, Esig2

This function calculates the estimated values of population mean and population variance by engraving the environment parameter er from 0 to 1 with a mesh and sampling each of them.

def makeERsamples(keys):
    ediv = keys['ediv']
    N = keys['N']
    #
    l = []
    #
    for e in np.linspace(0, 1, ediv):
        f = lambda : COVID19R0(e) #Sampling function in a specific environment parameter
        Eave, Esig2 = EstFromNsample(f, N)
        l.append([e, Eave, Esig2])
    return np.array(l)

A function that calculates the confidence interval. Note that alpha has the opposite meaning of $ \ alpha $ above. According to scipy.stats, "Endpoints of the range that contains alpha percent of the distribution" , T_dist.interval is a function that returns the endpoints of the interval containing the alpha percent of the t distribution.

def calcBand01(ave, sigma, N, alpha):
    t_dist = stats.t(loc=0,scale=1,df=N-1)
    b, u = t_dist.interval(alpha=alpha)
    b = ave + b * sigma / np.sqrt(N)
    u = ave + u * sigma / np.sqrt(N)
    return b, u

A function that displays the population mean estimate and its confidence interval.

def showBandEstimation(sample, keys):
    #
    alpha = keys['alpha']
    N = keys['N']
    #
    x = sample[:,0]
    y = sample[:,1]
    s2 = sample[:,2]
    s = np.sqrt(s2)
    #
    yb, yu = calcBand01(y, s, N, alpha)
    #
    fig, ax = plt.subplots(1,1,figsize=(5,4), dpi=200)
    ax.plot([0,1],[1,1], 'r--')
    ax.plot(x, y, label='ave')
    ax.plot(x, yu, '.', label='upper')
    ax.plot(x, yb, '.', label='lower')
    ax.fill_between(x, yu, yb, facecolor='r',alpha=0.2)
    ax.set_xlabel('er')
    ax.set_ylabel('R0')
    ax.set_xticks(np.linspace(0,1,11))
    ax.grid(b=True, which='both', axis='x')
    ax.legend()
    plt.show()

Finally, the part that performs sampling and displays the estimated population mean and confidence intervals.

keys = {'N':100, 'ediv':100, 'alpha':0.95 }

sample = makeERsamples(keys)
fig = showBandEstimation(sample, keys)

Calculation result

Now let's take a look at the calculation results.

For sampling number (N = 100)

For sampling number (N = 1000)

When the number of samplings (N = 10000)

There is considerable variability at $ N = 100 $, but at $ N = 10000 $ it looks like it's almost converged. Therefore, it seems that a fairly reliable estimate can be made with about 10,000 samples. The important point is that this estimation is 1. 2. The sample mean converges to a normal distribution by the central limit theorem. It means that the confidence interval is estimated by the t distribution, which is a two-step approach. So it's important to determine how much $ N $ is enough to make this estimate.

Consideration

From the above, the following tendency can be derived from the simulation regarding the estimation of the population mean of the basic reproduction number R0 for the environmental parameter er (how to avoid 3 dense spaces).

$ Er> 0.89 $ (95%) is required to reduce the population average of basic reproduction numbers to less than 1. In other words, if 89% or more of the people are not in 3 dense spaces ** or ** 1 person avoids 89% or more of 3 dense spaces in a day **, the infection will be resolved. I won't go.
In the worst condition (everyone does not avoid 3 dense spaces), the basic reproduction number is about 5.5.
By the way, as shown below, I will give you a ranking of effective reproduction numbers by prefecture, which may exceed 12. This is probably because the basic reproduction number has a large variance.

Furthermore ...

The need for more than 89% of the population to avoid three dense spaces may be roughly equivalent to lockdown. However, if you make full use of IT, such as working from home or online education, there will be nothing that cannot be done by region.
Since it is only three dense spaces that should be avoided, it may be possible to establish a Japanese model that does not stop economic activities after thoroughly taking measures against such spaces and isolating the source of infection.
Given that the fight against the new Corona is a long-term battle until the spread of vaccines and silver bullets, continuing lockdown is equivalent to holding your breath and eventually going bankrupt. We need a respirator-like policy to save people without killing the economy.

Reference link

I referred to the following page.

Opinions of the Expert Meeting on Countermeasures against New Coronavirus Infections
National Cluster Map
Statistics basics with python: 02 confidence interval / t distribution
Python scipy2 various distributions
[[In-house statistics study session] Draw chi-square distribution and t distribution in Python](https://techblog.gmo-ap.jp/2018/08/21/%E7%A4%BE%E5%86 % 85% E7% B5% B1% E8% A8% 88% E5% AD% A6% E5% 8B% 89% E5% BC% B7% E4% BC% 9A% E3% 82% AB% E3% 82% A4 % E4% BA% 8C% E4% B9% 97% E5% 88% 86% E5% B8% 83% E3% 81% A8t% E5% 88% 86% E5% B8% 83% E3% 82% 92python% E3 % 81% A7% E6% 8F% 8F% E7% 94% BB /)
scipy.stats
Statistics time
Verify the effect of leave as a countermeasure against the new coronavirus with the SEIR model
Ranking the number of effective reproductions of the new coronavirus by prefecture

Quantify the degree of self-restraint required to contain the new coronavirus