The other day, I saw some tweets on twitter saying that I was watching the number of people infected with the new coronavirus on a moving average for 7 days. I wanted to do everything from open data to visualization by myself, so I tried it. The data used this time is the data released by the Tokyo Metropolitan Government.
I think it's quick to see the graph first. The bar graph on the blue axis shows the number of positive patients revealed that day, and the pink line graph shows the number of positive patients who took a moving average for 7 days. Of course, you must be careful, and it cannot be said that the corona has been successfully contained by looking at this alone, but the line graph seems to be declining to the right.
Used in statistics,
In time series data, the average value for each fixed interval is calculated by shifting the interval.
Quote: https://bellcurve.jp/statistics/blog/15528.html
In other words, when there is the following data, the moving average of the three terms calculates the average of one value before and after each, so
1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|
It will be as follows.
1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|
X | 2 | 3 | 4 | 5 | 6 | X |
This time, we took a 7-day moving average of ± 3 days, so we took the average of 3 days before and after.
I referred to the site of here.
Use numpy.convolve
.
numpy.convolve(ndaaray, karnel, mode)
argument | Contents |
---|---|
ndaaray | Numpy array of time series data |
karnel | Kernel function(Determined by how many points the moving average is calculated) |
mode | same:Result output with the same number of elements as time series data, valid:Result output with fewer elements than time series data |
I used the entire Google Colab published in Karaage's Article. Thank you very much. The story goes awry, but Karaage-san often refers to articles even for robot-related people who are also in his own research field. I am always grateful for your help. (Suddenly a believer)
Only the changed parts from Karaage-san's are excerpted and described below. First, I imported seaborn with a completely personal preference.
I wanted to clean the x-axis label here, so I made it a datetime type.
date_data = np.array([])
for i in range(len(data['inspection_persons']['labels'])):
date_data = np.append(date_data, pd.to_datetime(data['inspection_persons']['labels'][i][0:10]))
print(date_data)
The above moving average is calculated by converting the original positive patient sequence into a numpy sequence.
Also, ± 3 days are filled with np.nan
.
#7-day moving average
n=7
patients_data_np_array = np.array(patients_data)
patients_data_move_ave = np.convolve(patients_data_np_array, np.ones(n)/float(n), 'valid')
nan_array = np.array([np.nan,np.nan,np.nan])
patients_data_move_ave = np.insert(patients_data_move_ave,0,nan_array)
patients_data_move_ave = np.insert(patients_data_move_ave,len(patients_data_move_ave),nan_array)
I put the above 7-day moving average in a line graph.
plt.figure(figsize=(9,6))
plt.bar(date_data, patients_data,label='number of patients')
plt.plot(date_data, patients_data_move_ave,color='salmon', linewidth = 3.0,label='moving average of number of patients')
plt.legend()
The following is this program. Since the environment is Google Colabratory, the first bash is executed with a magic command (!). You don't have to be Google Colabratory because you just wget normally.
!wget --no-check-certificate --output-document=covid19_tokyo.json 'https://raw.githubusercontent.com/tokyo-metropolitan-gov/covid19/development/data/data.json'
Below is the Python program.
#Library and data loading
import pandas as pd
import numpy as np
data = pd.read_json('covid19_tokyo.json')
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
#Date data
date_data = np.array([])
for i in range(len(data['inspection_persons']['labels'])):
date_data = np.append(date_data, pd.to_datetime(data['inspection_persons']['labels'][i][0:10]))
#Positive patient data
patients_data = []
for i in range(len(data['inspection_persons']['labels'])):
patients_data.append(data['patients_summary']['data'][i]['subtotal'])
#7-day moving average calculation
n=7
patients_data_np_array = np.array(patients_data)
patients_data_move_ave = np.convolve(patients_data_np_array, np.ones(n)/float(n), 'valid')
#Fill with nan
nan_array = np.array([np.nan,np.nan,np.nan])
patients_data_move_ave = np.insert(patients_data_move_ave,0,nan_array)
patients_data_move_ave = np.insert(patients_data_move_ave,len(patients_data_move_ave),nan_array)
#Visualization
plt.figure(figsize=(9,6))
plt.bar(date_data, patients_data,label='number of patients')
plt.plot(date_data, patients_data_move_ave,color='salmon', linewidth = 3.0,label='moving average of number of patients')
plt.legend()
The result is the graph below.
I feel that I was able to visualize what I saw on twitter immediately thanks to the Tokyo Metropolitan Government, which publishes the data, and to everyone who publishes the data collection method and moving average method. I was able to. Let's continue! I want to acquire various knowledge so that I can do it as soon as I think.
Moving Average Statistics WEB Statistics Time Calculation method of moving average Statistics WEB blog
How to easily visualize and analyze open data of new coronavirus infection (COVID-19) with Google Colaboratory Google Clobratory in the above article
Recommended Posts