Introduction

The other day, I saw some tweets on twitter saying that I was watching the number of people infected with the new coronavirus on a moving average for 7 days. I wanted to do everything from open data to visualization by myself, so I tried it. The data used this time is the data released by the Tokyo Metropolitan Government.

Graph

I think it's quick to see the graph first. The bar graph on the blue axis shows the number of positive patients revealed that day, and the pink line graph shows the number of positive patients who took a moving average for 7 days. Of course, you must be careful, and it cannot be said that the corona has been successfully contained by looking at this alone, but the line graph seems to be declining to the right.

moving average

Used in statistics,

In time series data, the average value for each fixed interval is calculated by shifting the interval.

Quote: https://bellcurve.jp/statistics/blog/15528.html

In other words, when there is the following data, the moving average of the three terms calculates the average of one value before and after each, so

1	2	3	4	5	6	7

It will be as follows.

1	2	3	4	5	6	7
X	2	3	4	5	6	X

This time, we took a 7-day moving average of ± 3 days, so we took the average of 3 days before and after.

Getting a moving average using numpy

I referred to the site of here.

Use numpy.convolve.

numpy.convolve(ndaaray, karnel, mode)

argument	Contents
ndaaray	Numpy array of time series data
karnel	Kernel function(Determined by how many points the moving average is calculated)
mode	same:Result output with the same number of elements as time series data, valid:Result output with fewer elements than time series data

Data visualization

I used the entire Google Colab published in Karaage's Article. Thank you very much. The story goes awry, but Karaage-san often refers to articles even for robot-related people who are also in his own research field. I am always grateful for your help. (Suddenly a believer)

Only the changed parts from Karaage-san's are excerpted and described below. First, I imported seaborn with a completely personal preference.

[Change] Read date data

I wanted to clean the x-axis label here, so I made it a datetime type.

date_data = np.array([])
for i in range(len(data['inspection_persons']['labels'])):
    date_data = np.append(date_data, pd.to_datetime(data['inspection_persons']['labels'][i][0:10]))
print(date_data)

[Addition] Calculation of 7-day moving average

The above moving average is calculated by converting the original positive patient sequence into a numpy sequence. Also, ± 3 days are filled with np.nan.

#7-day moving average
n=7
patients_data_np_array = np.array(patients_data)
patients_data_move_ave = np.convolve(patients_data_np_array, np.ones(n)/float(n), 'valid')

nan_array = np.array([np.nan,np.nan,np.nan])
patients_data_move_ave = np.insert(patients_data_move_ave,0,nan_array)
patients_data_move_ave = np.insert(patients_data_move_ave,len(patients_data_move_ave),nan_array)

[Change] Data visualization part

I put the above 7-day moving average in a line graph.

plt.figure(figsize=(9,6))
plt.bar(date_data, patients_data,label='number of patients')
plt.plot(date_data, patients_data_move_ave,color='salmon', linewidth = 3.0,label='moving average of number of patients')
plt.legend()

All programs

The following is this program. Since the environment is Google Colabratory, the first bash is executed with a magic command (!). You don't have to be Google Colabratory because you just wget normally.

!wget --no-check-certificate --output-document=covid19_tokyo.json 'https://raw.githubusercontent.com/tokyo-metropolitan-gov/covid19/development/data/data.json'

Below is the Python program.

#Library and data loading
import pandas as pd
import numpy as np
data = pd.read_json('covid19_tokyo.json')
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

#Date data
date_data = np.array([])
for i in range(len(data['inspection_persons']['labels'])):
    date_data = np.append(date_data, pd.to_datetime(data['inspection_persons']['labels'][i][0:10]))

#Positive patient data
patients_data = []
for i in range(len(data['inspection_persons']['labels'])):
    patients_data.append(data['patients_summary']['data'][i]['subtotal'])

#7-day moving average calculation
n=7
patients_data_np_array = np.array(patients_data)
patients_data_move_ave = np.convolve(patients_data_np_array, np.ones(n)/float(n), 'valid')
#Fill with nan
nan_array = np.array([np.nan,np.nan,np.nan])
patients_data_move_ave = np.insert(patients_data_move_ave,0,nan_array)
patients_data_move_ave = np.insert(patients_data_move_ave,len(patients_data_move_ave),nan_array)

#Visualization
plt.figure(figsize=(9,6))
plt.bar(date_data, patients_data,label='number of patients')
plt.plot(date_data, patients_data_move_ave,color='salmon', linewidth = 3.0,label='moving average of number of patients')
plt.legend()

The result is the graph below.

in conclusion

I feel that I was able to visualize what I saw on twitter immediately thanks to the Tokyo Metropolitan Government, which publishes the data, and to everyone who publishes the data collection method and moving average method. I was able to. Let's continue! I want to acquire various knowledge so that I can do it as soon as I think.

The site that I used as a reference

moving average

Moving Average Statistics WEB Statistics Time Calculation method of moving average Statistics WEB blog

Visualization

How to easily visualize and analyze open data of new coronavirus infection (COVID-19) with Google Colaboratory Google Clobratory in the above article

[Python] View data on new coronavirus infections on a 7-day moving average

Introduction

Graph

moving average

Getting a moving average using numpy

Data visualization

[Change] Read date data

[Addition] Calculation of 7-day moving average

[Change] Data visualization part

All programs

in conclusion

The site that I used as a reference

moving average

Visualization