(1) I tried trend analysis of the new corona with Python's Stats Model (2) Analysis of data from the Ministry of Health, Labor and Welfare revealed that the first peak was in April and this peak was in August. (3) That means that the next peak will be in December.
(1) Obtain data on new corona-infected persons from the Ministry of Health, Labor and Welfare home page (2) Decompose into trends, seasonal factors, and residuals with Python's StatsModel
(Special Thanks to) I referred to Let's start data analysis with Momoki. thank you very much.
Download the number of positives from the Ministry of Health, Labor and Welfare website. I was very impressed with how easy it was to download csv data. Ministry of Health, Labor and Welfare is amazing!
For detailed analysis methods, refer to "Getting Started with Momoki and Data Analysis" above.
First of all, preparatory work.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
% matplotlib inline
Then, the downloaded data is read. Contains data from January 16th.
df=pd.read_csv('pcr_positive_daily.csv')
df.head()
By the way, the latest data is from the day before yesterday. Basically, you can get the data of the previous day, but it will be updated in the evening. Erai is updated even on Saturdays and Sundays!
df.tail()
Now, let's take a look at the changes in the number of infected people up to the present since the latter half of January when the data was released.
%matplotlib inline
df.plot()
By the way, if you use matplotlib as it is, it becomes so-called "tofu" (laughs). The following page was very helpful for the solution. Thank you very much. How to translate Google collaboration graph (matplotlib) into Japanese!
Well, finally the main subject. MHLW data is decomposed using Python's StatsModel.
numbers = pd.Series(df['Number of positive PCR tests(Single day)'], dtype='float')
numbers.index = pd.to_datetime(df['date'])
res = sm.tsa.seasonal_decompose(numbers)
original = numbers #Original data
trend = res.trend #Trend data
seasonal = res.seasonal #Seasonal data
residual = res.resid #Residual data
plt.figure(figsize=(8, 8)) #Graph drawing frame creation, size specification
#Original data plot
plt.subplot(411) #Graph 4 rows 1 column 1st position (top)
plt.plot(original)
plt.ylabel('Original')
#trend data plot
plt.subplot(412) #Second position in 4 rows and 1 column of the graph
plt.plot(trend)
plt.ylabel('Trend')
#Plot of seasonal data
plt.subplot(413) #3rd position in 4 rows and 1 column of graph
plt.plot(seasonal)
plt.ylabel('Seasonality')
#plot of residual data
plt.subplot(414) #4th position in 4 rows and 1 column of graph (bottom)
plt.plot(residual)
plt.ylabel('Residuals')
plt.tight_layout() #Automatic adjustment of graph spacing
The results are as follows. I wrote it in the code, but in order from the top, ① Original data ② Trend data ③ Seasonal data ④ Residual data It will be.
Please pay attention to the second trend data. It can be seen that the first peak is in early April and this peak is in early August. From that point of view, is the next peak in early December?
I just pray that the new Corona will end early. However, we recognize that the reality is harsh.
The disease name of the new corona is COVID-19, but the virus name is SARS-CoV-2. It seems that this virus name is similar to SARS.
The SARS became popular in 2002, so it was almost 20 years ago. However, it seems that the SARS vaccine has not yet been made.
I would like to do what I can quietly because it is a difficult time.
I feel lonely that the drinking party has disappeared due to the influence of the new corona, It was also a good opportunity to promote rational work styles such as promoting telecommuting.
Let's live calmly at such times: relaxed:
Last but not least, I would like to thank all the people involved in the site for their reference.
Recommended Posts