While refraining from corona, I live in Tokyo, and I am overwhelmed by the number of infected people in Tokyo that is announced every day. However, I'm not sure why the number of infected people increased or decreased! In the first place, the number of tests increases and decreases significantly every day, and the period required for the tests varies, so I guess that the increase and decrease in the number of infected people depends on the number of tests and the test period, even though it is an amateur. Will end up Therefore, I wondered if it would be possible to graph the numerical values in a way that is a little easier to understand.
Tokyo Metropolitan Government New Coronavirus Infection Control Site https://stopcovid19.metro.tokyo.lg.jp/ The data of COVID 19 in Tokyo is updated daily here. (Please think that there is a slight time lag and it is one day late) I scraped the data on this site and decided to use it as the original data for graphing.
The data we want to obtain are the number of people who have been tested and the number of positive patients. Number of people to be inspected https://stopcovid19.metro.tokyo.lg.jp/cards/number-of-inspection-persons/ Number of positive patients https://stopcovid19.metro.tokyo.lg.jp/cards/number-of-confirmed-cases/
From the URL, use BeautifulSoup to download the site data. Let's open the URL in Chrome and display the developer tools. While following the HTML tags, go down to the tag where the target numerical value (number of positive patients) is written. A class (text-end) is set for the tag, and data is extracted using this class. Download the URL information you want to scrape with requests, and with BeautifulSoup, extract all the tags with the text-end class set from it.
Python
import requests
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt
kensa_url = 'https://stopcovid19.metro.tokyo.lg.jp/cards/number-of-inspection-persons/'
yousei_url = 'https://stopcovid19.metro.tokyo.lg.jp/cards/number-of-confirmed-cases/'
r = requests.get(kensa_url , timeout=10, params=None)
soup = BeautifulSoup(r.text,'html.parser')
kensa_data = soup.select('.text-end')
r = requests.get(yousei_url , timeout=10, params=None)
soup = BeautifulSoup(r.text,'html.parser')
yousei_data = soup.select('.text-end')
Looking at the contents of the extracted list, the first two were heads. I also found that dates and cumulatives are stored alternately. Let's extract only the necessary parts for ease of use.
[<th aria-label="Number of people to be inspected(By day)" aria-sort="none" class="text-end" role="columnheader" scope="col"><span>Number of people to be inspected(By day)</span></th>,
<th aria-label="Number of people to be inspected(Cumulative)" aria-sort="none" class="text-end" role="columnheader" scope="col"><span>Number of people to be inspected(Cumulative)</span></th>,
<td class="text-end">304</td>,
<td class="text-end">8,683</td>,
<td class="text-end">339</td>,
-----Omitted thereafter-----
Let's save various data in the list. for i in range(2, len(kensa_data), 2): Use to avoid the head and start the for statement from the third line of the list. Also, by extracting from the list every two, only the numerical value of the date will be extracted. Let's get the date at the same time. Get the downloaded today by datetime.today (), and return the date by one day each time you get the data with the for statement. num_list is also created for display.
Python
kensa_list = []
yousei_list = []
date_list = []
num_list = []
num = 0
date = datetime.today()
date = date - timedelta(days=1)
for i in range(2, len(kensa_data), 2):
kensa_list.append(kensa_data[i].string)
yousei_list.append(yousei_data[i].string)
date_list.append(datetime.strftime(date, '%Y/%m/%d'))
date = date - timedelta(days=1)
All data are in reverse chronological order, so use .reverse () to reverse the order.
Python
kensa_list.reverse()
yousei_list.reverse()
date_list.reverse()
If you want to save it, save it in CSV here.
Python
with open('COVID-19.csv','a') as f:
writer = csv.writer(f)
writer.writerow(['date', 'kensa', 'yousei'])
for i in range(len(date_list)):
writer.writerow([date_list[i], kensa_list[i], yousei_list[i]])
Let's check the number of people tested and the number of positive patients.
Python
plt.subplot(2,1,1)
plt.plot(num_list, kensa_list, label="kensa-list")
plt.legend()
plt.subplot(2,1,2)
plt.plot(num_list, yousei_list, label="yousei-list")
plt.legend()
As you can see from the graph, the number of people conducting inspections changes drastically depending on the day. At first glance, it seems that there is a correlation between the number of test patients and the number of positive patients, but looking around 80, it seems unnatural how the number of positive patients decreases even if the number of test patients drops significantly. To do. This seems to come from the fact that the daily number of tests does not always correspond to the daily number of positives.
Therefore, let's create data by dividing the total number of positives up to that day by the total number of tests up to the day before that day. By summing up, you can make a graph with the ratio of daily totals regardless of the schedule of test results. The number of tests was set by the previous day because the results on the same day of the tests do not seem to be reflected in the number of positives.
Python
kensa_total = 0
yousei_total = 0
kensa_yousei_list = []
for i in range(len(kensa_list)):
yousei_total = yousei_total + int(yousei_list[i])
if kensa_total == 0:
kensa_yousei_list.append(0)
else:
kensa_yousei_list.append(yousei_total/kensa_total)
kensa_total = kensa_total + int(kensa_list[i])
Add the total of kensa_total and yousei_total while turning with the for statement. While adding, add kensa_yousei_list by dividing kensa_total by yousei_tota each time.
Python
plt.plot(num_list, kensa_yousei_list, label="Average")
plt.legend()
plt.show()
In the beginning, the part where the numerical value rises significantly is because the number of inspections at the beginning of the data continued to be 0, so please ignore this, the graph is rising as you go to the second half. .. It can be seen that the percentage of positive numbers in the number of tests gradually increases. I couldn't tell if the percentage of requesters was really increasing just by the progress of the number of positives, but by dividing by the total number of tests in this way, a stable graph can be created, and by looking at this, it is positive. It can be seen that the proportion of people is also increasing. Around April 21, when this graph was created, the number of positives has decreased a little, so the end of the graph has decreased a little.
We were able to create an easy-to-understand graph by calculating the ratio of daily numerical data by totaling up to that point. If you look at the graph, you can see that it is increasing in a relatively clean manner. Also, since it does not depend on the increase or decrease in the number of daily inspections, I don't think you will be surprised when the number suddenly increases. (Because the number of inspections must increase before the value increases)
The code for this time is published below. https://github.com/no-B-github/COVID19_Data_Scraping
I tried to make it a web application so that the graph can be updated daily. In the future, I would like to keep an eye on this and do my best to refrain from COVID 19.
https://covid-19-tokyo.herokuapp.com/
Recommended Posts