I tried to predict the infection of new pneumonia using the SIR model: ☓ Wuhan edition ○ Hubei edition

Recently, new pneumonia has become popular. I had previously conducted research using a mathematical model of infectious diseases called the SIR model, so I applied it to a new model of pneumonia. Mathematical models can be used to predict the future of infectious diseases.

This time, we will focus on the center of outbreaks ~~ Wuhan ~~ *** Hubei Province *** (province including Wuhan) to model infectious diseases and predict the future of infectious diseases.

keyword Epidemiology, SIR model, nCoV-2019, New pneumonia, New coronavirus

SIR model

The SIR model is a model that expresses the transition of the number of infected people as a differential equation (also analyzed in an easy-to-understand manner in [Wikipedia](https://ja.wikipedia.org/wiki/SIR model)). In the SIR model, a person is considered to have three conditions for an infectious disease.

People who can get infected: S
Infected person: I
Those who have recovered from the infection and gained immunity, or who died: R

The SIR model is based on the S (t), I (t), and R (t) notations for those who may be infected at time t, those who are infected, and those who have been cured of the infection, respectively.

\dot{S}(t) = -\beta S(t)I(t),\\ \dot{I}(t) = \beta S(t)I(t) - \gamma I(t)\\ \dot{R}(t) = \gamma I(t)

It is described as. Here, β represents the infection rate, and γ represents the recovery rate (+ mortality rate). The increase in the number of infected people is proportional to the infection rate β, the person S (t) who may be infected, and the person I (t) who is infected.

Please note that people who die do not cause infection, so they are equated with people who have been cured of the infection.

here, $ S(t) + I(t) + R(t) = N $ Is constant and matches the population of the area. This time, we will use the population of Hubei Province ~~ Wuhan ~~.

~~ Wuhan ~~ Using the infection data of new hepatitis in Hubei Province, we will learn the infection rate β and recovery rate γ and predict the future of Wuhan.

Data used

Infection data is taken from here published on kaggle. In addition, the population data of ~~ Wuhan ~~ Hubei Province uses the 2017 demographic data described in here.

Learning parameters using SIR model

~~ Wuhan ~~ The transition of infection from January 22nd to February 4th, 2020 in Hubei Province is as follows.

The blue line is the number of infected people, the orange line is the number of deaths, and the green line is the number of people who have recovered. Is it a little strange that the number of people recovered and the number of deaths are about the same? I think.

I tried fitting with the SIR model so that this data can be expressed.

Recovered people is the sum of the number of people who have recovered and the number of deaths. In addition, the blue dot is the number of infected people actually observed, and the blue line is the result of approximation by the SIR model. The orange dots and orange lines are the measured and predicted values of the number of people who have recovered.

It seems that it can be approximated sufficiently.

Wuhan Predicting the future of Hubei infection

It seems that it can be approximated enough, so I used the learned parameters to predict the future of infection in Hubei Province.

The following figure shows the forecast for 10 days from February 4th.

The points are the measured values and the lines are the predicted values. According to the SIR model, it seems to increase.

Next is the forecast for one year from February 4th.

The infection doesn't seem to stop at all.

Consideration

--Using the SIR model, I tried to predict the spread of the new pneumonia infection in Hubei Province, but the result was that the spread would not stop. --~~ Wuhan ~~ It seems that the cause is that the number of people who recovered in Hubei Province is not accurately measured. --If the recovery rate γ can be measured accurately using other data, it will be possible to make better predictions.

Future tasks

Next, I would like to predict the spread of infection throughout China based on traffic volume.

Postscript: Basic reproduction number R0

The infectivity of the disease is evaluated by the basic reproduction number R0. R0 is given by the ratio of the dimensionless infection rate β_hat to the dimensionless recovery rate γ_hat. Therefore, the basic reproduction number R0 in Hubei Province is

R_0 = \frac{\hat{\beta}}{\hat{\gamma}} = \frac{\beta N^2}{\gamma N} \approx 17.54

It becomes. This value is about the same as airborne diseases such as measles. It is thought that the strength of this infectivity is because the recovery rate is underestimated as mentioned in the discussion.

code

https://github.com/yuji0001/2020nCoV_analysis