[Introduction to SIR model] Predict the end time of each country with COVID-19 data fitting ♬

In the previous SEIR model, among the obtained parameters, values such as $ lp \ fallingdotseq 1e ^ {-23} $ were obtained. In such a case, there is a possibility that the digit loss or the differential equation is not working in the first place, so be careful. So, this time I will change this part and try to fit with the simplest model.

What i did

・ A little theory ・ Code explanation ・ The end of Japan ・ End of each country ・ A little discussion

・ A little theory

The SIR model is the following simple time evolution equation.

{\begin{align}
\frac{dS}{dt} &= -\beta \frac{SI}{N} \\
\frac{dI}{dt} &=  \beta \frac{SI}{N} -\gamma I \\
\frac{dR}{dt} &=  \gamma I \\
\end{align} 
}

The first equation is that the decrease in the number of uninfected persons is proportional to the number of infected persons and the number of uninfected persons, and is inversely proportional to the total number of target persons. Decrease. As β / N, it can be considered as the infection rate per capita. The second equation is the equation for increasing the number of infected people. The increase is the difference between the influx from uninfected individuals and the healers. The third equation is an equation that a certain percentage of infected persons are cured, and this coefficient γ is a cure rate called a removal rate. 【reference】・ [Is this infection spreading or converging: the physical meaning and determination of the reproduction number R](https://rad-it21.com/%E3%82%B5%E3%82%A4%E3%82%A8] % E3% 83% B3% E3% 82% B9 / kado-shinichiro_20200327 /)

Here, the second equation can be transformed as follows.

\frac{dI}{dt} =  \gamma (\frac{\beta}{\gamma} \frac{S}{N} - 1)I\\

That is, if $ S / N $ is considered to be constant, the following solution can be formally obtained.

You can think of it as time evolution within a sufficiently small amount of time.

I =  I_0\exp(\gamma (\frac{\beta}{\gamma} \frac{S}{N} - 1)t)\\

At the onset of infection, it is $ S_0 \ fallingdotseq N $, in which case the following solution can be obtained.

{\begin{align}
I &=  I_0\exp(\gamma (R_0 - 1)t)\\
here\\ 
R_0 &= \frac{\beta}{\gamma}
\end{align}
}

Here, in the formula of $ I $, the number of infected persons decreases exponentially when $ R_0 <1 $, and conversely increases exponentially when $ R_0> 1 $. Therefore, this ** $ R_0 $ is called the basic reproduction number ** and serves as a guide for determining whether or not infection transmission occurs. In addition, in the above reference,

R = R_0\frac{S}{N}

Is defined as ** effective reproduction number **. This $ R $ becomes smaller as $ \ frac {S} {N} $ becomes smaller as the infection progresses. That is, as can be seen from the above formal solution, when $ R_0> 1 $, $ R> 1 $ is still present even at the initial stage of expansion, and the infection spreads, but eventually $ \ frac {S} {N} $ is small. Therefore, it is an index showing that the end starts when $ R <1 $ is realized. On the other hand, if the left side of $ \ frac {dI} {dt} = 0 $ = 0, the infection peaks, and the number of uninfected persons at this time is $ S_p $,

{\begin{align}
R_0 &= \frac{\beta}{\gamma}\\
& = \frac{N}{S_p}　\\
\end{align}
}

Is established, at this time

{\begin{align}
R &= R_0\frac{S_p}{N}\\
&= 1
\end{align}
}

It turns out that holds true. That is, ** the effective reproduction number R = 1 at the peak of infection, and the transmission of infection will end thereafter. ** ** This time, I will look at the end prediction by plotting this ** effective reproduction number ** as well.

・ Code explanation

The entire code is placed below. ・ Collive_particles / fitting_SIR_COVID2.py The Lib used is as follows same as last time.

import numpy as np
from scipy.integrate import odeint
from scipy.optimize import minimize
import matplotlib.pyplot as plt
import pandas as pd

Data in advance from COVID-19 site Is downloading.

#Read CSV data with pandas.
data = pd.read_csv('COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
data_r = pd.read_csv('COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')
data_d = pd.read_csv('COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
    
confirmed = [0] * (len(data.columns) - 4)
confirmed_r = [0] * (len(data_r.columns) - 4)
confirmed_d = [0] * (len(data_d.columns) - 4)
days_from_22_Jan_20 = np.arange(0, len(data.columns) - 4, 1)

The record order of the data was correctly arranged last time, but this time the order of the countries and regions of the three data is not the same, so obtain each as follows, and calculate the number of existing infections for the infection number data In order to do so, I try to acquire it after acquiring the healing number data. The if statement is used interchangeably between the country and the region. In addition, the total number of cures and total number of deaths required for mortality and cure rate are not calculated this time, so they are commented out.

#City
city = "Japan"
#Process the data
t_cases = 0
for i in range(0, len(data_r), 1):
    if (data_r.iloc[i][1] == city): #for country/region
    #if (data_r.iloc[i][0] == city):  #for province:/state  
        print(str(data_r.iloc[i][0]) + " of " + data_r.iloc[i][1])
        for day in range(4, len(data.columns), 1):            
            confirmed_r[day - 4] += data_r.iloc[i][day]
            #t_recover += data_r.iloc[i][day]
for i in range(0, len(data), 1):
    if (data.iloc[i][1] == city): #for country/region
    #if (data.iloc[i][0] == city):  #for province:/state  
        print(str(data.iloc[i][0]) + " of " + data.iloc[i][1])
        for day in range(4, len(data.columns), 1):
            confirmed[day - 4] += data.iloc[i][day] -  confirmed_r[day - 4]            
for i in range(0, len(data_d), 1):
    if (data_d.iloc[i][1] == city): #for country/region
    #if (data_d.iloc[i][0] == city):  #for province:/state  
        print(str(data_d.iloc[i][0]) + " of " + data_d.iloc[i][1])
        for day in range(4, len(data.columns), 1):
            confirmed_d[day - 4] += data_d.iloc[i][day] #fro drawings
            #t_deaths += data_d.iloc[i][day]

This time we will use the SIR model. The evaluation function also returns the number of uninfected S, the number of infections I, and the number of cures R.

#define differencial equation of sir model
def sir_eq(v,t,beta,gamma):
    a = -beta*v[0]*v[1]
    b = beta*v[0]*v[1] - gamma*v[1]
    c = gamma*v[1]
    return [a,b,c]
def estimate_i(ini_state,beta,gamma):
    v=odeint(sir_eq,ini_state,t,args=(beta,gamma))
    est=v[0:int(t_max/dt):int(1/dt)] 
    return est[:,0],est[:,1],est[:,2] #v0-S,v1-I,v2-R

The objective function is the normal variance. Here, the number of infections I and the number of cures R are linearly summed so that they can be fitted at the same time.

#define logscale likelihood function
def y(params):
    est_i_0,est_i_1,est_i_2=estimate_i(ini_state,params[0],params[1])
    return np.sum(f1*(est_i_1-obs_i_1)*(est_i_1-obs_i_1)+f2*(est_i_2-obs_i_2)*(est_i_2-obs_i_2))

Enter the total number of infections in the country (or region) selected above in N0. In addition, f1 and f2 determine which curve the fitting should be performed on. N, S0, I0, R0 are the initial values of each, but these are also constants that should be adjusted according to the fitting condition. The initial values of beta and gamma are similar in most countries, but it seems better to re-enter the values once obtained and calculate recursively.

You should also do this initial value fitting, but this time we will not do it

#solve seir model
N0 = 1517
f1,f2 = 1,1 #data & data_r fitting factor
N,S0,I0,R0=int(N0*1.5),int(N0*1.5),int(10),int(0)
ini_state=[S0,I0,R0]
beta,gamma = 1.6e-6, 1e-3 #The initial value does not converge unless it is close to some extent

t_max=len(days_from_22_Jan_20) #Fitting within a certain range of data
dt=0.01
t=np.arange(0,t_max,dt)

I will try to display it below, including checking the initial value.

plt.plot(t,odeint(sir_eq,ini_state,t,args=(beta,gamma)))
plt.legend(['Susceptible','Infected','Recovered'])

obs_i_1 = confirmed
obs_i_2 = confirmed_r

plt.plot(obs_i_1,"o", color="red",label = "data_1")
plt.plot(obs_i_2,"o", color="green",label = "data_2")
plt.legend()
plt.pause(1)
plt.close()

Make a fitting. I used minimize @ scipy this time as well. Calculate the basic reproduction number R0. Since the variable name overlaps with the initial healing number above, it is in lowercase.

#optimize logscale likelihood function
mnmz=minimize(y,[beta,gamma],method="nelder-mead")
print(mnmz)
#R0
#N_total = S_0+I_0+R_0
#R0 = N_total*beta_const *(1/gamma_const)
beta_const,gamma = mnmz.x[0],mnmz.x[1] #Infection rate, removal rate (curing rate)
print(beta_const,gamma)
r0 = N*beta_const*(1/gamma)
print(r0)

Graph it below. Ax3 and ax4 sub-axises are added to plot the effective reproduction number.

#plot reult with observed data
fig, (ax1,ax2) = plt.subplots(2,1,figsize=(1.6180 * 4, 4*2))
ax3 = ax1.twinx()
ax4 = ax2.twinx()

lns1=ax1.plot(confirmed,"o", color="red",label = "data_I")
lns2=ax1.plot(confirmed_r,"o", color="green",label = "data_R")
est_i_0,est_i_1,est_i_2=estimate_i(ini_state,mnmz.x[0],mnmz.x[1])
lns3=ax1.plot(est_i_1, label = "estimation_I")
lns0=ax1.plot(est_i_2, label = "estimation_R")
lns_1=ax3.plot((S0-est_i_1-est_i_2)*r0/N,".", color="black", label = "effective_R0")
ax3.set_ylim(0,r0+1)

lns_ax1 = lns1+lns2 + lns3 + lns0 +lns_1
labs_ax1 = [l.get_label() for l in lns_ax1]
ax1.legend(lns_ax1, labs_ax1, loc=0)
ax1.set_ylim(0,N0)
ax1.set_title('SIR_{} f1_{:,d} f2_{:,d};N={:.0f} S0={:.0f} I0={:.0f} R0={:.0f} R={:.2f}'.format(city,f1,f2,N,S0,I0,R0,(S0-est_i_1[-1]-est_i_2[-1])*r0/N))
ax1.set_ylabel("Susceptible, Infected, recovered ")
ax3.set_ylabel("effective_R0")

The future forecast is graphed below.

t_max=200 #len(days_from_22_Jan_20)
t=np.arange(0,t_max,dt)

lns4=ax2.plot(confirmed,"o", color="red",label = "data")
lns5=ax2.plot(confirmed_r,"o", color="green",label = "data_r")
est_i_0,est_i_1,est_i_2=estimate_i(ini_state,mnmz.x[0],mnmz.x[1])
lns6=ax2.plot(est_i_0, label = "estimation_S")
lns7=ax2.plot(est_i_1, label = "estimation_I")
lns8=ax2.plot(est_i_2, label = "estimation_R")
lns_6=ax4.plot((S0-est_i_1-est_i_2)*r0/N,".", color="black", label = "effective_R0")

lns_ax2 = lns4+lns5 + lns6 + lns7+ lns8 +lns_6
labs_ax2 = [l.get_label() for l in lns_ax2]
ax2.legend(lns_ax2, labs_ax2, loc=0)
ax4.set_ylim(0,r0+1)
ax2.set_ylim(0,S0)
ax2.set_title('SIR_{};b={:.2e} g={:.2e} r0={:.2f}'.format(city,beta_const,gamma,r0))
ax2.set_ylabel("Susceptible, Infected, recovered ")
ax4.set_ylabel("effective_R0")

ax1.grid()
ax2.grid()
plt.savefig('./fig/SIR_{}f1_{:,d}f2_{:,d};b_{:.2e}g_{:.2e}r0_{:.2f}N_{:.0f}S0_{:.0f}I0_{:.0f}R0_{:.0f}.png'.format(city,f1,f2,beta_const,gamma,r0,N,S0,I0,R0)) 
plt.show()
plt.close()

・ The end of Japan

** Data is data up to March 27, 2020 **. The data is as follows. The results are as follows. SIR_Japanf1_1f2_1;b_4.53e-05g_1.58e-02r0_6.52N_2275S0_2275I0_10R0_0.png This result seems to be fitted for the time being, but today's press says that it infected 200 people yesterday, March 28, so it's out of this graph and doesn't match. .. I posted this as a bonus. The obtained quantities are as follows.

{\begin{align}
\beta &= 4.53e^{-5}\\
\gamma &= 1.58e^{-2}\\
N &= 2275\\
R0 &= 6.52\\
R(27.3.2020) &= 2.71\\
\end{align} 
}

・ End of each country

The situation in Hubei

The latest (27.3.2020) data is as follows. Wuhan data cannot be matched as follows when trying to fit both infection and cure data. However, as you can see in the graph, it can be said that the effective reproduction number R = 0.01, 0.08. SIR_Hubeif1_1f2_0;b_3.19e-06g_6.53e-02r0_5.20N_106462S0_106462I0_318R0_389.png

{\begin{align}
\beta &= 3.19e^{-6}\\
\gamma &= 6.53e^{-2}\\
N &= 106462\\
R0 &= 5.20\\
R(27.3.2020) &= 0.01\\
\end{align} 
}

The following are fittings of both at the same time, and the peak number of infections has shifted significantly. SIR_Hubeif1_1f2_1;b_2.34e-06g_5.33e-02r0_4.67N_106462S0_106462I0_318R0_389.png

{\begin{align}
\beta &= 2.34e^{-6}\\
\gamma &= 5.33e^{-2}\\
N &= 106462\\
R0 &= 4.67\\
R(27.3.2020) &= 0.08\\
\end{align} 
}

Status of Korea, South

The latest (27.3.2020) data is as follows. fig_Korea, South_.png

The Korean data is not as bad as Wuhan, but it is almost the same, and even if you try to fit both the infection number and cure number data, they cannot be matched as follows. However, as you can see in the graph, the effective reproduction number R = 0.13 is ending. SIR_Korea, Southf1_1f2_0;b_2.25e-05g_2.94e-02r0_8.68N_11365S0_11365I0_1R0_0.png

{\begin{align}
\beta &= 2.25e^{-5}\\
\gamma &= 2.94e^{-2}\\
N &= 11365\\
R0 &= 8.68\\
R(27.3.2020) &= 0.13\\
\end{align} 
}

If you try to fit both, the fitting of the infection number data will be loose and the infection peak will shift to the right. It may be possible to fit it if it fits more properly, but this time I will not pursue it any more. SIR_Korea, Southf1_1f2_1;b_2.11e-05g_2.14e-02r0_11.17N_11365S0_11365I0_1R0_0.png

{\begin{align}
\beta &= 2.11e^{-5}\\
\gamma &= 2.14e^{-2}\\
N &= 11365\\
R0 &= 11.17\\
R(27.3.2020) &= 0.18\\
\end{align} 
}

・ Situation of Iran

The next country that is likely to end is Iran. The latest (27.3.2020) data is as follows. However, when I came here and the cure rate exceeded 30%, it seemed to be stagnant. Even so, the effective reproduction number is 1.81, and it will soon be 1, so it can be said that the peak number of infections is near. SIR_Iranf1_1f2_1;b_2.92e-06g_4.94e-02r0_3.70N_62478S0_62478I0_10R0_0.png

{\begin{align}
\beta &= 2.92e^{-6}\\
\gamma &= 4.94e^{-2}\\
N &= 62478\\
R0 &= 3.70\\
R(27.3.2020) &= 1.81\\
\end{align} 
}

・ Situation of Italy

The next country of concern is Italy. The latest (27.3.2020) data is as follows. The data for this country is very solid and it feels good to fit both the infection and cure curves. The result is as follows, the effective reproduction number is 4.07, but it has dropped sharply and it seems that it will end in about 10 days. SIR_Italyf1_1f2_1;b_1.46e-06g_1.91e-02r0_10.99N_143448S0_143448I0_1R0_0.png

{\begin{align}
\beta &= 1.46e^{-6}\\
\gamma &= 1.91e^{-2}\\
N &= 143448\\
R0 &= 10.99\\
R(27.3.2020) &= 4.07\\
\end{align} 
}

Looking at this data, I get the impression that medical care has not collapsed and that management such as the effect of urban blockade is solid. (Comments have no scientific basis other than graphs. It is an individual impression.)

・ Situation of Spain

Next, I will show the situation in Spain, where deaths are increasing rapidly. When I write the conclusion, it seems that it is still in the early expansion period, and the fitting is indefinite.

Situations that cannot be decided now with this model The latest (27.3.2020) data is as follows. The following shows the case where N is smaller and the case where N is larger. N etc. are uncertain, but the effective reproduction number is 7.81 and 8.09 respectively, but the end time (infection peak) is around 80-90 days (2 weeks to 1 month later), and if the peak value is ignored, It can be said that they are in a similar place.

{\begin{align}
\beta &= 1.49e^{-6}\\
\gamma &= 1.37e^{-2}\\
N &= 127542\\
R0 &= 13.85\\
R(27.3.2020) &= 7.81\\
\end{align} 
}

SIR_Spainf1_1f2_1;b_5.39e-07g_1.94e-02r0_9.86N_354285S0_354285I0_1R0_0.png

{\begin{align}
\beta &= 5.39e^{-7}\\
\gamma &= 1.94e^{-2}\\
N &= 354285\\
R0 &= 9.86\\
R(27.3.2020) &= 8.09\\
\end{align} 
}

Actually, the same is true for Germany, France, and Switzerland, but I tried to fit it, but due to the rapid increase, this time (parameters are undefined), so I will write it again.

・ A little discussion

I'm not sure if I can really accept it, so I'll write a little. The biggest drawback of this fitting is that it is fixed to N. It seems that the situation where the infection is transmitted naturally in a closed subject can be simulated properly. However, many countries in the world have not taken complete lockdowns, and at least up to this point there has been a movement of people, including (as a percentage) many infected people themselves. In Japan, the number of such outpatients is increasing, and there are also situations where clusters are formed from there. So, what I wrote here is a fitting of the transition so far, and I can not predict the future situation. In order for these predictions to be correct, it is assumed that the same situation will continue, and it is necessary to be able to control the proportion of outpatients extremely. In fact, it is possible to write a model that considers these effects, but since it is difficult to predict the number of outpatients that change randomly from moment to moment, it seems impossible to predict with this model. So, I think that fitting will be possible if the whole of Europe and the United States begins to be seen, and it will be possible to discuss the obtained parameters.

Summary

・ I tried fitting COVID-19 data with the SIR model. ・ It turned out that Wuhan is nearing its end, and South Korea has exceeded the peak of infection toward the end. ・ Iran and Italy can be expected to reach the peak of infection relatively quickly in about 10 days. ・ In Japan, the peak of infection is relatively late because it is saturated with sloppyness, and it can be expected that the peak of infection will come in 20 days depending on the outpatients. ・ If the current situation progresses, it is thought that the peak of infection will begin to appear in many countries in the next month.

・ I want to monitor daily and see changes in the situation ・ I would like to evaluate the obtained parameters.

bonus

Now, when I look at the data, it was reflected ** until March 28, 2020, so I tried fitting it. However, yesterday's data are outliers, and the peak infection period does not change much. SIR_Japanf1_1f2_1;b_3.91e-05g_1.65e-02r0_6.22N_2617S0_2617I0_10R0_0.png The parameters obtained are as follows, and have not changed much compared to the above. However, ** the effective reproduction number R is small, but it is increasing, so be careful **.

{\begin{align}
\beta &= 3.91e^{-5}\\
\gamma &= 1.65e^{-2}\\
N &= 2617\\
R0 &= 6.22\\
R(28.3.2020) &= 2.79\\
\end{align} 
}

I would like to see the future trends.