I found a paper by The Lancet [^ 1] that predicts the number of infected people with a mathematical model, so I implemented a mathematical model to predict the number of infected people in Japan and compared it with other infectious diseases.
--The number of infected people was predicted by the SEIR model. --The basic reproduction number of the new corona is about 2.9, which is a little higher than influenza. ――If infection control is not effective, the peak number of infected people in Japan is around GW. ――The number of domestic deaths in the next year will be over 10,000 (when the case fatality rate is 0.01%). --The immigration restrictions for foreigners are meaningless.
Predict the number of people infected with the new coronavirus (COVID-19, hereinafter referred to as the new corona) in Japan using a mathematical model.
Compare the basic reproduction number of the new corona with other infectious diseases.
Implementation of scientific calculations using Scipy.
"Formulas describe the process of how an infectious disease spreads, how long an infected person develops and becomes severe." [^ 2]
In the mathematical model of infectious diseases, the population is divided according to the stage of infection, and the increase or decrease of each group is expressed by a differential equation. I will explain the simple SEIR model and the extended version used this time.
-$ S
Then, the following simultaneous differential equations are obtained.
\begin{align}
\frac{dS(t)}{dt} &= -\beta \frac{I(t)}{N}S(t) \\
\frac{dE(t)}{dt} &= \beta \frac{I(t)}{N}S(t) - \sigma E(t) \\
\frac{dI(t)}{dt} &= \sigma E(t) - \gamma I(t) \\
\frac{dR(t)}{dt} &= \gamma I(t) \\
N &= S(t) + E(t) + I(t) + R(t)
\end{align}
However, it ignores the birth rate and mortality rate, and says that it will never be infected again because it will recover and acquire immunity after the onset.
If you give an appropriate initial value and solve it, you can predict the number of infected people in the future. (I will put the figure [^ 3] for reference)
For detailed explanation and implementation, refer to qiita's article [^ 4].
Here, $ R_0 = \ beta / \ gamma $ is called the basic reproduction number, and "(all individuals are initially susceptible). It means "the number of secondary infections produced per infected person (in the state of having)". [^ 5] ** The point is that the larger $ R_0 $, the greater the infectivity. ** **
You can also use $ R_0 $ to calculate how widespread vaccination should be to prevent an infectious disease epidemic.
In addition to SEIR, various models can be created by subdividing the entire population by place of residence and age, and by reflecting the characteristics of infectious diseases in differential equations. For example, there are SIR that does not consider E, SEIS that does not consider immunity acquisition, and MSEIR that considers passive immunity [^ 6].
Since the above SEIR is too simple, this time we will extend it to a model that considers the inflow and outflow of the population and the mortality rate with foreign countries by referring to the paper [^ 1].
-$ R_0
Then, the following simultaneous differential equations are also obtained.
\begin{align}
\frac{dS(t)}{dt} &= - \frac{R_0}{D_I} \frac{I(t)}{N(t)} S(t) - N_{J,I} +
\left(1 - \frac{E(t) + I(t)}{N(t)} \right)N_{I,J} \\
\frac{dE(t)}{dt} &= \frac{R_0}{D_I} \frac{I(t)}{N(t)} S(t) - \frac{E(t)}{T_E} + \frac{E(t) + I(t)}{N(t)} N_{I,J}\\
\frac{dI(t)}{dt} &= \frac{E(t)}{T_E} - \frac{I(t)}{T_I}\\
\frac{dR(t)}{dt} &= (1-f) \frac{I(t)}{T_I}\\
\frac{dD(t)}{dt} &= f \frac{I(t)}{T_I} \\
N(t) &= S(t) + E(t) + I(t) + R(t)
\end{align}
However, regarding $ N_ {J, I} $ and $ N_ {I, J} $, it is assumed that all those who go abroad from Japan belong to $ S $, and those who come to Japan from overseas belong to $ S, E $. I will.
As in the paper [^ 1], parameters other than $ R_0 $ are substituted with appropriate values. $ T_I and T_E $ are set to $ T_E = 6.0 and T_I = 2.4 $ with reference to SARS Corona. $ N_ {J, I}, N_ {I, J} $ is set to $ N_ {J, I} = 40000, N_ {I, J} = 70000 $ with reference to the number of travelers in February 2019.
The unknown parameter of the model is $ R_0 $. There are various estimation methods such as least squares method, maximum likelihood estimation, MAP estimation, and Bayesian estimation [^ 7], but this time, the likelihood is calculated as a non-stationary Poisson process [^ 8], and $ R_0 $ is calculated by maximum likelihood estimation. ..
Maximum likelihood estimation that everyone loves. First, express the probability (likelihood) of the observed data as a function of $ R_0 $, and find $ R_0 $ that maximizes it. When the number of infected people is modeled as a non-stationary Poisson process with reference to the paper [^ 1] and the likelihood is calculated,
L(R_0) = \prod_{d=D_s}^{D_e} \frac{e^{-\lambda_d}\lambda_d^{x_d}}{x_d!}
However, $ D_s $: the first day of observation, $ D_s $: the last day of observation, $ x_d $: the number of infected people at $ d $,
\lambda (t, R_0) = E(t) + I(t) \\
\lambda _d (R_0) = \int_{d-1}^{d} \lambda (u, R_0) du
will do. $ \ lambda $ represents the average number of infected people when they follow a Poisson distribution.
Excluding the terms irrelevant to $ R_0 $ from the log-likelihood, $ R_0 $ can be estimated as follows.
\newcommand{\argmin}{\mathop{\rm arg~min}\limits}
\argmin_{R_0 > 0} - \log L(R_0) = \argmin_{R_0 > 0} \sum_{d=D_s}^{D_e} (\lambda_d(R_0) - x_d \log \lambda_d(R_0))
import scipy
from scipy.integrate import odeint
from scipy.optimize import minimize
from sklearn.metrics import mean_absolute_error, mean_squared_error
class SIR(object):
def __init__(self, r_0, t_i, dt, init_S, init_I, init_R):
self.r_0 = r_0
self.t_i = t_i
self.dt = dt
self.init_state_list = [init_S, init_I, init_R]
def _get_param(self, params, key, default, t=None):
"""Corresponds to time-varying parameters
"""
if isinstance(params, dict):
if key in list(params.keys()):
param = params[key]
if isinstance(param, list):
return param[np.clip(int(t / self.dt), 0, len(param)-1)]
elif isinstance(param, np.ndarray):
return param
else:
return default
else:
return default
else:
return default
def _ode(self, state_list, t=None, params=None):
"""Define simultaneous differential equations
"""
r_0 = self._get_param(params, 'r_0', self.r_0, t=t)
t_i = self._get_param(params, 't_i', self.t_i, t=t)
S, I, R = state_list
N = S + I + R
dstate_dt = list()
dstate_dt[0] = - (r_0 / t_i) * (I / N) * S
dstate_dt[1] = (r_0 / t_i) * (I / N) * S - I / t_i
dstate_dt[2] = I / t_i
return dstate_dt
def solve_ode(self, len_days=365, params=None):
"""Solve differential equations
"""
t = np.linspace(0, len_days, int(len_days / self.dt), endpoint=False)
args = (params,) if params else ()
return odeint(self._ode, self.init_state_list, t, args=args)
class CustomizedSEIRD(SIR):
def __init__(self, r_0=None, t_e=6.0, t_i=2.4, n_i_j=70000, n_j_i=40000, f=0.0001, dt=1,
init_S=126800000, init_E=0, init_I=1, init_R=0, init_D=0):
self.r_0 = r_0
self.t_e = t_e
self.t_i = t_i
self.n_i_j = n_i_j
self.n_j_i = n_j_i
self.f = f
self.dt = dt
self.init_state_list = [init_S, init_E, init_I, init_R, init_D]
def _ode(self, state_list, t=None, params=None):
"""Define simultaneous differential equations
"""
r_0 = self._get_param(params, 'r_0', self.r_0, t=t)
t_e = self._get_param(params, 't_e', self.t_e, t=t)
t_i = self._get_param(params, 't_i', self.t_i, t=t)
n_i_j = self._get_param(params, 'n_i_j', self.n_i_j, t=t)
n_j_i = self._get_param(params, 'n_j_i', self.n_j_i, t=t)
f = self._get_param(params, 'f', self.f)
S, E, I, R, D = state_list
N = S + E + I + R
dstate_dt = list()
dstate_dt.append(- (r_0 / t_i) * (I / N) * S - n_j_i + (1 - ((E + I) / N)) * n_i_j)
dstate_dt.append((r_0 / t_i) * (I / N) * S - E / t_e + ((E + I) / N) * n_i_j)
dstate_dt.append(E / t_e - I / t_i)
dstate_dt.append((1 - f) * I / t_i)
dstate_dt.append(f * I / t_i)
return dstate_dt
def _calc_neg_log_likelihood_r0(self, r_0, X):
"""Log likelihood(R_Only the part related to 0)To calculate
"""
solution = self.solve_ode(len_days=len(X), params=dict(r_0=r_0))
lambda_arr = solution[int(1/self.dt)-1::int(1/self.dt), 2] # I
return - np.sum(- lambda_arr + X * np.log(lambda_arr))
def _calc_error(self, r_0, X, metric=mean_absolute_error):
"""Calculate mean absolute error and mean square error
"""
solution = self.solve_ode(len_days=len(X), params=dict(r_0=r_0))
e_arr = solution[int(1/self.dt)-1::int(1/self.dt), 2] # I
return metric(X, e_arr)
def exec_point_estimation(self, init_r_0, X, project='mle'):
"""Point estimate parameters
"""
if project == 'mle':
result = minimize(self._calc_neg_log_likelihood_r0, init_r_0, args=(X,), method='Nelder-Mead')
elif project == 'lad':
result = minimize(self._calc_error, init_r_0, args=(X, mean_absolute_error), method='Nelder-Mead')
elif project == 'ls':
result = minimize(self._calc_error, init_r_0, args=(X, mean_squared_error), method='Nelder-Mead')
else:
print(f'Invalid project: {project}')
return None
if self.r_0 is None:
self.r_0 = result.x[0]
return result
def exec_map(self):
"""MAP estimation
"""
pass
def exec_mcmc(self):
"""MCMC method
"""
pass
if __name__ == '__main__':
# 1/16 to 3/Number of infected people up to 8
X = np.array([
1, 1, 1, 1, 1, 1, 1, 1, 2, 3, # 1/25
4, 4, 7, 8, 14, 17, 20, 20, 20, 23, # 2/4
35, 45, 86, 90, 96, 161, 163, 203, 251, 259, # 2/14
338, 414, 520, 616, 705, 728, 743, 743, 838, 851, # 2/24
862, 891, 919, 938, 947, 961, 980, 999, 1035, 1056, # 3/5
1113, 1157, 1190 # 3/8
])
model = CustomizedSEIRD()
result = model.exec_point_estimation(init_r_0=1, X=X, project='mle')
print(result.x[0])
2.8806152343750036
The "number of infected people" is difficult to handle, and only those who visit the hospital are actually counted, and no one knows the true $ x_d $ considering the diagnostic criteria and the sensitivity specificity of the test. Hmm. Here, we estimated based on the number of infected people [^ 9] including cruise ships.
Prediction of the number of infected people in Japan by the extended SEIR model (I is the number of infected people)
It is a graph when $ R_0 $ estimated by the blue line does not change in the future. If this goes on, the number of infected people will peak just after the GW is over. It is a graph when the red line and the blue line change $ R_0 $ after 3/9. Since $ R_0 $ is "the number of secondary infections produced per infected person", it will be smaller due to infection control measures and movement restrictions. If $ R_0 $ can be made small enough, it can be contained like a green line.
The figure below is a figure released by the Ministry of Health and Welfare [^ 9], but the shape of the graph of the number of infected people matches exactly.
Also, for example, if you change $ N_ {i, j} $, you can calculate whether the number of infected people will change due to immigration restrictions of foreigners. If you calculate with the same model, even if $ N_ {i, j} = 0 $ after 3/9, if $ R_0 $ is the same, the number of infected people will not change. This means that immigration restrictions are meaningless (if this model is correct).
Next, let's compare the basic reproduction number and mortality rate with other infectious diseases [^ 10]. (The case fatality rate of the new corona is approximately 0.1% to 5% of the case fatality rate of each country.)
It seems that the basic reproduction number and case fatality rate of the new corona are not significantly higher than those of other infectious diseases. (For the risk of infectious diseases, it is necessary to consider the severity of symptoms and the presence or absence of vaccines, and I think that low basic reproduction number and case fatality rate do not mean that it is safe, but the details are Ask someone who is resistant to infectious diseases)
Ministry of Health, Labor and Welfare
Japan Society for Infectious Diseases
Cool site where you can check the number of infected people (Johns Hopkins CSSE)
** I am not an infectious disease specialist, a public health major, or a statistician. The actual infectious disease is not as simple as this model, so this prediction is probably wrong. We are not responsible for any damages caused by the contents of this article. ** **
[^ 2]: Stop infectious diseases with a mathematical model
[^ 7]: Maximum likelihood estimation, MAP estimation, Bayesian estimation learned in Fujii 4th Dan
[^ 8]: SIR model and transient Poisson process
[^ 9]: About new coronavirus infection
Recommended Posts