I tried to fit the infection data of the new corona using the SEIR model. The method is the same as the previous [Introduction to minimize] Data analysis with SEIR model ♬. Infection data was obtained from the following reference sites. The following results were obtained in [Introduction to matplotlib] Reading the end time from COVID-19 data ♬ for Wuhan data. The initial value is selected and fitted so that such a curve can be obtained, and the result is obtained as follows. So, this time, I would like to make similar fittings for each country and consider whether new knowledge can be obtained. The data is obtained from the following, and unless otherwise noted, it is ** data as of March 24, 2020 **, but the above is ** as of March 22, 2020 **. Also, as explained in the code explanation, please note that this time the fitting is performed for the infection data and the fitting using the healing data is executed by different programs. 【reference】 CSSEGISandData/COVID-19
・ The end of Japan ・ Code explanation ・ Fitting of national data
I tried to write it gently as mentioned above, but since the Tokyo Metropolitan Government has said that it is a blockade of the city, I would like to mention whether the end of Japan can be seen first. From the conclusion, it seems that the end will surely be seen in another month. Actually, in the above data on March 22, there was a situation where I fell asleep for a while and this was the peak of infection, but since the number of infections increased linearly here, the saturation was slightly postponed. So, as a fitting, I got the following graph for the data up to March 24th.
The fitting determines both the number of infections and the cure curve at the same time. The data for Japan is a little strange from the 53rd to the 59th, but it is generally moving naturally. So, I got a beautiful curve as follows. In other words, the result is that the peak appears in about 10 days, the healing curve peaks in about a month, and the rest gradually ends.
The obtained parameters are
beta_const=0.159
lp=1.97e-23
gamma=52.7
N=1474
R0 = 4.45
The entire code is below. The two reasons are that the provided data structure has changed and the current data needs to be read individually as in (2). In (2), the fitting function was changed from the normal maximum likelihood function to the normal square error so that the curve could be fitted more easily. ①collective_particles/fitting_COVID.py ②collective_particles/fitting_COVID2.py
The simple code is explained below. Use the following Lib. It is the sum of the articles from the previous two articles.
#include package
import numpy as np
from scipy.integrate import odeint
from scipy.optimize import minimize
import matplotlib.pyplot as plt
import pandas as pd
The following is the same as the infection data drawing.
#Read CSV data with pandas.
data = pd.read_csv('COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv')
data_r = pd.read_csv('COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Recovered.csv')
data_d = pd.read_csv('COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv')
confirmed = [0] * (len(data.columns) - 4)
confirmed_r = [0] * (len(data_r.columns) - 4)
confirmed_d = [0] * (len(data_d.columns) - 4)
days_from_22_Jan_20 = np.arange(0, len(data.columns) - 4, 1)
#City
city = "Hubei" #Wuhan data
#Process the data
t_cases = 0
for i in range(0, len(data), 1):
#if (data.iloc[i][1] == city): #for country/region
if (data.iloc[i][0] == city): #for province:/state
print(str(data.iloc[i][0]) + " of " + data.iloc[i][1])
for day in range(4, len(data.columns), 1):
confirmed[day - 4] += data.iloc[i][day] - data_r.iloc[i][day]
confirmed_r[day - 4] += data_r.iloc[i][day]
confirmed_d[day - 4] += data_d.iloc[i][day]
Use the SEIR model.
#define differencial equation of seir model
def seir_eq(v,t,beta,lp,ip,N0):
N=N0 #int(26749*1) #58403*4
a = -beta*v[0]*v[2]/N
b = beta*v[0]*v[2]/N-(1/lp)*v[1] #N
c = (1/lp)*v[1]-(1/ip)*v[2]
d = (1/ip)*v[2]
return [a,b,c,d]
Here, the picture obtained depends on the value of N0. It's good to fit this, but this time it's not done. When fitting Hubei, I referred to the following values. However, for ** N and S0, the actual number of infected people (values moved around) was adopted based on the idea that the residents are not all infected in the sense that they are participants in the game. 【reference】 ・ I tried to predict the behavior of the new coronavirus with the SEIR model. Initial value setting
#solve seir model
N0 = 70939 #Hubei
N,S0,E0,I0=int(N0*1.),N0,int(318),int(389) #N is total population, S0 initial value of fresh peaple
ini_state=[S0,E0,I0,0] #initial value of diff. eq.
beta,lp,ip=1, 2, 7.4
t_max=60 #len(days_from_22_Jan_20)
dt=0.01
t=np.arange(0,t_max,dt)
obs_i = confirmed
A function that returns an evaluation value for an input parameter. The following is v [2]; returns the number of infections.
#function which estimate i from seir model func
def estimate_i(ini_state,beta,lp,ip,N):
v=odeint(seir_eq,ini_state,t,args=(beta,lp,ip,N))
est=v[0:int(t_max/dt):int(1/dt)]
return est[:,2]
The objective function of minimization below (the following uses the maximum likelihood function).
#define logscale likelihood function
def y(params):
est_i=estimate_i(ini_state,params[0],params[1],params[2],params[3])
return np.sum(est_i-obs_i*np.log(np.abs(est_i)))
Find the minimum value with minimize @ scipy and calculate the basic reproduction number.
#optimize logscale likelihood function
mnmz=minimize(y,[beta,lp,ip,N],method="nelder-mead")
print(mnmz)
#R0
beta_const,lp,gamma_const,N = mnmz.x[0],mnmz.x[1],mnmz.x[2],mnmz.x[3] #Infection rate, infection waiting time, removal rate (recovery rate)
print(beta_const,lp,gamma_const,N)
R0 = N*beta_const*(1/gamma_const)
print(R0)
Graph display of calculation results.
#plot reult with observed data
fig, (ax1,ax2) = plt.subplots(2,1,figsize=(1.6180 * 4, 4*2))
lns1=ax1.plot(obs_i,"o", color="red",label = "data")
lns2=ax1.plot(confirmed_r,"*", color="green",label = "data_r")
lns3=ax1.plot(estimate_i(ini_state,mnmz.x[0],mnmz.x[1],mnmz.x[2],mnmz.x[3]), label = "estimation")
lns0=ax1.plot(t,odeint(seir_eq,ini_state,t,args=(mnmz.x[0],mnmz.x[1],mnmz.x[2],mnmz.x[3])))
lns_ax1 = lns1+lns2 + lns3 + lns0
labs_ax1 = [l.get_label() for l in lns_ax1]
ax1.legend(lns_ax1, labs_ax1, loc=0)
ax1.set_ylim(0,N0)
The following displays a long-span graph so that the end can be seen.
t_max=200 #len(days_from_22_Jan_20)
t=np.arange(0,t_max,dt)
lns4=ax2.plot(obs_i,"o", color="red",label = "data")
lns5=ax2.plot(confirmed_r,"*", color="green",label = "recovered")
lns6=ax2.plot(t,odeint(seir_eq,ini_state,t,args=(mnmz.x[0],mnmz.x[1],mnmz.x[2],mnmz.x[3])))
ax2.legend(['data','data_r','Susceptible','Exposed','Infected','Recovered'], loc=1)
ax2.set_title('SEIR_b{:.2f}_ip{:.2f}_gamma{:.2f}_N{:.2f}_E0{:d}_I0{:d}_R0{:.2f}'.format(beta_const,lp,gamma_const,N,E0,I0,R0))
ax1.grid()
ax2.grid()
plt.savefig('./fig/SEIR_{}_b{:.2f}_ip{:.2f}_gamma{:.2f}_N{:.2f}_E0{:d}_I0{:d}_R0{:.2f}_.png'.format(city,beta_const,lp,gamma_const,N,E0,I0,R0))
plt.show()
plt.close()
On the other hand, in the simultaneous fitting of the number of infections and the number of cures, the evaluation function and objective function in the middle were changed as follows.
#function which estimate i from seir model func
def estimate_i(ini_state,beta,lp,ip,N):
v=odeint(seir_eq,ini_state,t,args=(beta,lp,ip,N))
est=v[0:int(t_max/dt):int(1/dt)]
return est[:,2],est[:,3] #v0-S,v1-E,v2-I,v3-R
#define logscale likelihood function
def y(params):
est_i_2,est_i_3=estimate_i(ini_state,params[0],params[1],params[2],params[3])
return np.sum((est_i_2-obs_i_2)*(est_i_2-obs_i_2)+(est_i_3-obs_i_3)*(est_i_3-obs_i_3))
In this case, it is possible to fit only the number of infections or only the number of cures.
·Iran Iran, which has a relatively high cure rate, saturates faster than Japan and is likely to end, but the number of infections is still increasing and the cure rate is sluggish, so it may take longer.
·Korea Somehow, the distortion became more noticeable compared to the previous data, and the credibility of the data was broken. Still, the cure rate is increasing overall, and the end seems to be relatively early.
·Italy Although the mortality rate is extremely high and dangerous, the data are very clean and sophisticated. Therefore, it was easy to fit and I got on cleanly. The healing rate is still about 10%, and it seems that it will take some time (healing peak is 2 months), but if the peak is seen after this, the end is near.
The following is the initial expansion period, and it seems that the fitting accuracy is poor and it is not reliable, but I will post it. ·Spain It stands up more and more, but it's difficult to match. ·Germany This has not been completed yet, and the number of cures is too small in the first place, so it seems that it is in the initial expansion period.
・ I tried fitting COVID-19 data ・ Japan, Iran, and South Korea are expected to see an end following China. ・ Other Europe and the United States are still in the expansion period, and it is difficult to predict the end.
・ I talked roughly this time, but I would like to make a more detailed analysis while watching the daily updates.
Simultaneous fitting of Wuhan is as follows Wuhan data does not seem to be explained by this model (> <) raw data Infection data only fitting Healing data only fitting Fitting infection and cure data
Recommended Posts