I tried to predict the number of domestically infected people of the new corona with a mathematical model

Overview

I found a paper by The Lancet [^ 1] that predicts the number of infected people with a mathematical model, so I implemented a mathematical model to predict the number of infected people in Japan and compared it with other infectious diseases.

Summary

--The number of infected people was predicted by the SEIR model. --The basic reproduction number of the new corona is about 2.9, which is a little higher than influenza. ――If infection control is not effective, the peak number of infected people in Japan is around GW. ――The number of domestic deaths in the next year will be over 10,000 (when the case fatality rate is 0.01%). --The immigration restrictions for foreigners are meaningless.

[Purpose of this time](# 1-Purpose of this time)
[Mathematical model of infectious diseases](# 2-Mathematical model of infectious diseases)
[Parameter estimation](# 3-Parameter estimation)
[Implemented and illustrated in python](# 4-Implemented and illustrated in python)
[Supplement](# 5-Supplement)

1. Purpose of this time

Predict the number of people infected with the new coronavirus (COVID-19, hereinafter referred to as the new corona) in Japan using a mathematical model.
Compare the basic reproduction number of the new corona with other infectious diseases.
Implementation of scientific calculations using Scipy.

2. Mathematical model of infectious diseases

"Formulas describe the process of how an infectious disease spreads, how long an infected person develops and becomes severe." [^ 2]

In the mathematical model of infectious diseases, the population is divided according to the stage of infection, and the increase or decrease of each group is expressed by a differential equation. I will explain the simple SEIR model and the extended version used this time.

2.1. SEIR model

-$ S : Number of people without immunity (Susceptible) - E : Number of people in the post-exposure period (Exposed) - I : Number of people during the infectious period (Infectious) - R : Number of people who recovered from infection and gained immunity (Recovered) - \ beta $: Infection rate, $ \ gamma $: Recovery (quarantine) rate, $ \ sigma $: Probability of getting infectious, $ N $: Total population

Then, the following simultaneous differential equations are obtained.

\begin{align}
\frac{dS(t)}{dt} &= -\beta \frac{I(t)}{N}S(t) \\

\frac{dE(t)}{dt} &= \beta \frac{I(t)}{N}S(t) - \sigma E(t) \\

\frac{dI(t)}{dt} &= \sigma E(t) - \gamma I(t) \\

\frac{dR(t)}{dt} &= \gamma I(t) \\

N &= S(t) + E(t) + I(t) + R(t)
\end{align}

However, it ignores the birth rate and mortality rate, and says that it will never be infected again because it will recover and acquire immunity after the onset.

If you give an appropriate initial value and solve it, you can predict the number of infected people in the future. (I will put the figure [^ 3] for reference)

For detailed explanation and implementation, refer to qiita's article [^ 4].

Here, $ R_0 = \ beta / \ gamma $ is called the basic reproduction number, and "(all individuals are initially susceptible). It means "the number of secondary infections produced per infected person (in the state of having)". [^ 5] ** The point is that the larger $ R_0 $, the greater the infectivity. ** **

You can also use $ R_0 $ to calculate how widespread vaccination should be to prevent an infectious disease epidemic.

In addition to SEIR, various models can be created by subdividing the entire population by place of residence and age, and by reflecting the characteristics of infectious diseases in differential equations. For example, there are SIR that does not consider E, SEIS that does not consider immunity acquisition, and MSEIR that considers passive immunity [^ 6].

2.2. Extended SEIRD model

Since the above SEIR is too simple, this time we will extend it to a model that considers the inflow and outflow of the population and the mortality rate with foreign countries by referring to the paper [^ 1].

-$ R_0 : Basic reproduction number ( R_0 = \ beta / \ gamma ) - T_E : Average post-exposure period (days) ( \ gamma ^ {-1} = T_I ) - T_I : Average infectious period (days) ( \ sigma ^ {-1} = T_E ) - N_ {J, I} : Number of travelers from Japan to overseas (people / day) - N_ {I, J} : Number of travelers from overseas to Japan (people / day) - f : Case fatality rate - D (t) $: Number of deaths (people)

Then, the following simultaneous differential equations are also obtained.

\begin{align}
\frac{dS(t)}{dt} &= - \frac{R_0}{D_I} \frac{I(t)}{N(t)} S(t) - N_{J,I} + 
 \left(1 - \frac{E(t) + I(t)}{N(t)} \right)N_{I,J} \\

\frac{dE(t)}{dt} &= \frac{R_0}{D_I} \frac{I(t)}{N(t)} S(t) - \frac{E(t)}{T_E} + \frac{E(t) + I(t)}{N(t)} N_{I,J}\\

\frac{dI(t)}{dt} &= \frac{E(t)}{T_E} - \frac{I(t)}{T_I}\\

\frac{dR(t)}{dt} &= (1-f) \frac{I(t)}{T_I}\\

\frac{dD(t)}{dt} &= f \frac{I(t)}{T_I} \\

N(t) &= S(t) + E(t) + I(t) + R(t)
\end{align}

However, regarding $ N_ {J, I} $ and $ N_ {I, J} $, it is assumed that all those who go abroad from Japan belong to $ S $, and those who come to Japan from overseas belong to $ S, E $. I will.

As in the paper [^ 1], parameters other than $ R_0 $ are substituted with appropriate values. $ T_I and T_E $ are set to $ T_E = 6.0 and T_I = 2.4 $ with reference to SARS Corona. $ N_ {J, I}, N_ {I, J} $ is set to $ N_ {J, I} = 40000, N_ {I, J} = 70000 $ with reference to the number of travelers in February 2019.

3. Parameter estimation

The unknown parameter of the model is $ R_0 $. There are various estimation methods such as least squares method, maximum likelihood estimation, MAP estimation, and Bayesian estimation [^ 7], but this time, the likelihood is calculated as a non-stationary Poisson process [^ 8], and $ R_0 $ is calculated by maximum likelihood estimation. ..

3.1. Maximum likelihood estimation

Maximum likelihood estimation that everyone loves. First, express the probability (likelihood) of the observed data as a function of $ R_0 $, and find $ R_0 $ that maximizes it. When the number of infected people is modeled as a non-stationary Poisson process with reference to the paper [^ 1] and the likelihood is calculated,

L(R_0) = \prod_{d=D_s}^{D_e} \frac{e^{-\lambda_d}\lambda_d^{x_d}}{x_d!}

However, $ D_s $: the first day of observation, $ D_s $: the last day of observation, $ x_d $: the number of infected people at $ d $,

\lambda (t, R_0) = E(t) + I(t) \\
\lambda _d (R_0) = \int_{d-1}^{d} \lambda (u, R_0) du

will do. $ \ lambda $ represents the average number of infected people when they follow a Poisson distribution.

Excluding the terms irrelevant to $ R_0 $ from the log-likelihood, $ R_0 $ can be estimated as follows.

\newcommand{\argmin}{\mathop{\rm arg~min}\limits}

\argmin_{R_0 > 0} - \log L(R_0) = \argmin_{R_0 > 0} \sum_{d=D_s}^{D_e} (\lambda_d(R_0) - x_d \log \lambda_d(R_0))

4. Implemented and illustrated in python

Implementation of SEIR class

import scipy
from scipy.integrate import odeint
from scipy.optimize import minimize
from sklearn.metrics import mean_absolute_error, mean_squared_error


class SIR(object):

    def __init__(self, r_0, t_i, dt, init_S, init_I, init_R):
        self.r_0 = r_0
        self.t_i = t_i
        self.dt = dt
        self.init_state_list = [init_S, init_I, init_R]

    def _get_param(self, params, key, default, t=None):
        """Corresponds to time-varying parameters
        """
        if isinstance(params, dict):
            if key in list(params.keys()):
                param = params[key]
                if isinstance(param, list):
                    return param[np.clip(int(t / self.dt), 0, len(param)-1)]
                elif isinstance(param, np.ndarray):
                    return param
                else:
                    return default
            else:
                return default
        else:
            return default

    def _ode(self, state_list, t=None, params=None):
        """Define simultaneous differential equations
        """
        r_0 = self._get_param(params, 'r_0', self.r_0, t=t)
        t_i = self._get_param(params, 't_i', self.t_i, t=t)
        S, I, R = state_list
        N = S + I + R
        dstate_dt = list()
        dstate_dt[0] = - (r_0 / t_i) * (I / N) * S
        dstate_dt[1] = (r_0 / t_i) * (I / N) * S - I / t_i
        dstate_dt[2] = I / t_i
        return dstate_dt

    def solve_ode(self, len_days=365, params=None):
        """Solve differential equations
        """
        t = np.linspace(0, len_days, int(len_days / self.dt), endpoint=False)
        args = (params,) if params else ()
        return odeint(self._ode, self.init_state_list, t, args=args)


class CustomizedSEIRD(SIR):

    def __init__(self, r_0=None, t_e=6.0, t_i=2.4, n_i_j=70000, n_j_i=40000, f=0.0001, dt=1,
            init_S=126800000, init_E=0, init_I=1, init_R=0, init_D=0):
        self.r_0 = r_0
        self.t_e = t_e
        self.t_i = t_i
        self.n_i_j = n_i_j
        self.n_j_i = n_j_i
        self.f = f
        self.dt = dt
        self.init_state_list = [init_S, init_E, init_I, init_R, init_D]

    def _ode(self, state_list, t=None, params=None):
        """Define simultaneous differential equations
        """
        r_0 = self._get_param(params, 'r_0', self.r_0, t=t)
        t_e = self._get_param(params, 't_e', self.t_e, t=t)
        t_i = self._get_param(params, 't_i', self.t_i, t=t)
        n_i_j = self._get_param(params, 'n_i_j', self.n_i_j, t=t)
        n_j_i = self._get_param(params, 'n_j_i', self.n_j_i, t=t)
        f = self._get_param(params, 'f', self.f)
        S, E, I, R, D = state_list
        N = S + E + I + R
        dstate_dt = list()
        dstate_dt.append(- (r_0 / t_i) * (I / N) * S - n_j_i + (1 - ((E + I) / N)) * n_i_j)
        dstate_dt.append((r_0 / t_i) * (I / N) * S - E / t_e + ((E + I) / N) * n_i_j)
        dstate_dt.append(E / t_e - I / t_i)
        dstate_dt.append((1 - f) * I / t_i)
        dstate_dt.append(f * I / t_i)
        return dstate_dt

    def _calc_neg_log_likelihood_r0(self, r_0, X):
        """Log likelihood(R_Only the part related to 0)To calculate
        """
        solution = self.solve_ode(len_days=len(X), params=dict(r_0=r_0))
        lambda_arr = solution[int(1/self.dt)-1::int(1/self.dt), 2] # I
        return - np.sum(- lambda_arr + X * np.log(lambda_arr))
    
    def _calc_error(self, r_0, X, metric=mean_absolute_error):
        """Calculate mean absolute error and mean square error
        """
        solution = self.solve_ode(len_days=len(X), params=dict(r_0=r_0))
        e_arr = solution[int(1/self.dt)-1::int(1/self.dt), 2]  # I
        return metric(X, e_arr)

    def exec_point_estimation(self, init_r_0, X, project='mle'):
        """Point estimate parameters
        """
        if project == 'mle':
            result = minimize(self._calc_neg_log_likelihood_r0, init_r_0, args=(X,), method='Nelder-Mead')
        elif project == 'lad':
            result = minimize(self._calc_error, init_r_0, args=(X, mean_absolute_error), method='Nelder-Mead')
        elif project == 'ls':
            result = minimize(self._calc_error, init_r_0, args=(X, mean_squared_error), method='Nelder-Mead')
        else:
            print(f'Invalid project: {project}')
            return None
        if self.r_0 is None:
            self.r_0 = result.x[0]
        return result

    def exec_map(self):
        """MAP estimation
        """
        pass

    def exec_mcmc(self):
        """MCMC method
        """
        pass

if __name__ == '__main__':
    # 1/16 to 3/Number of infected people up to 8
    X = np.array([
        1, 1, 1, 1, 1, 1, 1, 1, 2, 3,  # 1/25
        4, 4, 7, 8, 14, 17, 20, 20, 20, 23,  # 2/4
        35, 45, 86, 90, 96, 161, 163, 203, 251, 259,  # 2/14
        338, 414, 520, 616, 705, 728, 743, 743, 838, 851, # 2/24
        862, 891, 919, 938, 947, 961, 980, 999, 1035, 1056, # 3/5
        1113, 1157, 1190 # 3/8
    ])
    model = CustomizedSEIRD()
    result = model.exec_point_estimation(init_r_0=1, X=X, project='mle')
    print(result.x[0])

2.8806152343750036

The "number of infected people" is difficult to handle, and only those who visit the hospital are actually counted, and no one knows the true $ x_d $ considering the diagnostic criteria and the sensitivity specificity of the test. Hmm. Here, we estimated based on the number of infected people [^ 9] including cruise ships.

Prediction of the number of infected people in Japan by the extended SEIR model (I is the number of infected people)

It is a graph when $ R_0 $ estimated by the blue line does not change in the future. If this goes on, the number of infected people will peak just after the GW is over. It is a graph when the red line and the blue line change $ R_0 $ after 3/9. Since $ R_0 $ is "the number of secondary infections produced per infected person", it will be smaller due to infection control measures and movement restrictions. If $ R_0 $ can be made small enough, it can be contained like a green line.

The figure below is a figure released by the Ministry of Health and Welfare [^ 9], but the shape of the graph of the number of infected people matches exactly.

厚生省コロナ感染者数.png

Also, for example, if you change $ N_ {i, j} $, you can calculate whether the number of infected people will change due to immigration restrictions of foreigners. If you calculate with the same model, even if $ N_ {i, j} = 0 $ after 3/9, if $ R_0 $ is the same, the number of infected people will not change. This means that immigration restrictions are meaningless (if this model is correct).

Next, let's compare the basic reproduction number and mortality rate with other infectious diseases [^ 10]. (The case fatality rate of the new corona is approximately 0.1% to 5% of the case fatality rate of each country.)

It seems that the basic reproduction number and case fatality rate of the new corona are not significantly higher than those of other infectious diseases. (For the risk of infectious diseases, it is necessary to consider the severity of symptoms and the presence or absence of vaccines, and I think that low basic reproduction number and case fatality rate do not mean that it is safe, but the details are Ask someone who is resistant to infectious diseases)

5. Supplement

Ministry of Health, Labor and Welfare

Japan Society for Infectious Diseases

WHO

CDC

Cool site where you can check the number of infected people (Johns Hopkins CSSE)

** I am not an infectious disease specialist, a public health major, or a statistician. The actual infectious disease is not as simple as this model, so this prediction is probably wrong. We are not responsible for any damages caused by the contents of this article. ** **

[^ 2]: Stop infectious diseases with a mathematical model

[^ 4]: Infectious Disease Mathematical Model Beginning: Overview of SEIR Model by Python and Introduction to Parameter Estimation

[^ 5]: Prediction of infectious disease epidemics: Quantitative issues in infectious disease mathematical models

[^ 7]: Maximum likelihood estimation, MAP estimation, Bayesian estimation learned in Fujii 4th Dan

[^ 8]: SIR model and transient Poisson process

[^ 9]: About new coronavirus infection