[Python] View data on new coronavirus infections on a 7-day moving average

Introduction

The other day, I saw some tweets on twitter saying that I was watching the number of people infected with the new coronavirus on a moving average for 7 days. I wanted to do everything from open data to visualization by myself, so I tried it. The data used this time is the data released by the Tokyo Metropolitan Government.

Graph

I think it's quick to see the graph first. The bar graph on the blue axis shows the number of positive patients revealed that day, and the pink line graph shows the number of positive patients who took a moving average for 7 days. Of course, you must be careful, and it cannot be said that the corona has been successfully contained by looking at this alone, but the line graph seems to be declining to the right.

image.png

moving average

Used in statistics,

In time series data, the average value for each fixed interval is calculated by shifting the interval.

Quote: https://bellcurve.jp/statistics/blog/15528.html

In other words, when there is the following data, the moving average of the three terms calculates the average of one value before and after each, so

1 2 3 4 5 6 7

It will be as follows.

1 2 3 4 5 6 7
X 2 3 4 5 6 X

This time, we took a 7-day moving average of ± 3 days, so we took the average of 3 days before and after.

Getting a moving average using numpy

I referred to the site of here.

Use numpy.convolve.

numpy.convolve(ndaaray, karnel, mode)
argument Contents
ndaaray Numpy array of time series data
karnel Kernel function(Determined by how many points the moving average is calculated)
mode same:Result output with the same number of elements as time series data, valid:Result output with fewer elements than time series data

Data visualization

I used the entire Google Colab published in Karaage's Article. Thank you very much. The story goes awry, but Karaage-san often refers to articles even for robot-related people who are also in his own research field. I am always grateful for your help. (Suddenly a believer)

Only the changed parts from Karaage-san's are excerpted and described below. First, I imported seaborn with a completely personal preference.

[Change] Read date data

I wanted to clean the x-axis label here, so I made it a datetime type.

date_data = np.array([])
for i in range(len(data['inspection_persons']['labels'])):
    date_data = np.append(date_data, pd.to_datetime(data['inspection_persons']['labels'][i][0:10]))
print(date_data)

[Addition] Calculation of 7-day moving average

The above moving average is calculated by converting the original positive patient sequence into a numpy sequence. Also, ± 3 days are filled with np.nan.

#7-day moving average
n=7
patients_data_np_array = np.array(patients_data)
patients_data_move_ave = np.convolve(patients_data_np_array, np.ones(n)/float(n), 'valid')

nan_array = np.array([np.nan,np.nan,np.nan])
patients_data_move_ave = np.insert(patients_data_move_ave,0,nan_array)
patients_data_move_ave = np.insert(patients_data_move_ave,len(patients_data_move_ave),nan_array)

[Change] Data visualization part

I put the above 7-day moving average in a line graph.

plt.figure(figsize=(9,6))
plt.bar(date_data, patients_data,label='number of patients')
plt.plot(date_data, patients_data_move_ave,color='salmon', linewidth = 3.0,label='moving average of number of patients')
plt.legend()

All programs

The following is this program. Since the environment is Google Colabratory, the first bash is executed with a magic command (!). You don't have to be Google Colabratory because you just wget normally.

!wget --no-check-certificate --output-document=covid19_tokyo.json 'https://raw.githubusercontent.com/tokyo-metropolitan-gov/covid19/development/data/data.json'

Below is the Python program.

#Library and data loading
import pandas as pd
import numpy as np
data = pd.read_json('covid19_tokyo.json')
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

#Date data
date_data = np.array([])
for i in range(len(data['inspection_persons']['labels'])):
    date_data = np.append(date_data, pd.to_datetime(data['inspection_persons']['labels'][i][0:10]))

#Positive patient data
patients_data = []
for i in range(len(data['inspection_persons']['labels'])):
    patients_data.append(data['patients_summary']['data'][i]['subtotal'])

#7-day moving average calculation
n=7
patients_data_np_array = np.array(patients_data)
patients_data_move_ave = np.convolve(patients_data_np_array, np.ones(n)/float(n), 'valid')
#Fill with nan
nan_array = np.array([np.nan,np.nan,np.nan])
patients_data_move_ave = np.insert(patients_data_move_ave,0,nan_array)
patients_data_move_ave = np.insert(patients_data_move_ave,len(patients_data_move_ave),nan_array)

#Visualization
plt.figure(figsize=(9,6))
plt.bar(date_data, patients_data,label='number of patients')
plt.plot(date_data, patients_data_move_ave,color='salmon', linewidth = 3.0,label='moving average of number of patients')
plt.legend()

The result is the graph below.

image.png

in conclusion

I feel that I was able to visualize what I saw on twitter immediately thanks to the Tokyo Metropolitan Government, which publishes the data, and to everyone who publishes the data collection method and moving average method. I was able to. Let's continue! I want to acquire various knowledge so that I can do it as soon as I think.

The site that I used as a reference

moving average

Moving Average Statistics WEB Statistics Time Calculation method of moving average Statistics WEB blog

Visualization

How to easily visualize and analyze open data of new coronavirus infection (COVID-19) with Google Colaboratory Google Clobratory in the above article

Recommended Posts

[Python] View data on new coronavirus infections on a 7-day moving average
Folium: Visualize data on a map with Python
Run Python on Apache to view InfluxDB data
[Grasshopper] When creating a data tree on Python script
How to build a new python virtual environment on Ubuntu
[Treasure Data] [Python] Execute a query on Treasure Data using TD Client
Build a python data analysis environment on Mac (El Capitan)
[Python] Notes on data analysis
Get data from MySQL on a VPS with Python 3 and SQLAlchemy
Building a Python environment on Mac
Building a Python environment on Ubuntu
Create a Python environment on Mac (2017/4)
Create a python environment on centos
Build a python3 environment on CentOS7