On January 16, 2020, a new type of coronavirus infection (disease name) caused by SARS-CoV-2 (virus name) was confirmed for the first time in Japan. Unfortunately, the disease has killed many people, from ordinary people to celebrities. Even now, more than half a year after that, the epidemic has not subsided, and masks are a necessity when going out. In this post, we have briefly analyzed and summarized the coronavirus in Japan. I hoped that this analysis would give me some awareness and improve my analysis skills.
In analyzing the coronavirus this time, we used the CSV data published by Jag Japan Co., Ltd.. Thank you very much. I will post the link below.
About "Map of the number of people infected with the new coronavirus"
COVID-19.ipynb
import collections
import matplotlib.pyplot as plt
import pandas as pd
COVID-19.ipynb
pd.set_option('display.max_columns', None)
df = pd.read_csv('COVID-19.csv')
df
In JupyterLab, if there are many columns, the display will be omitted, so display everything in the first line.
COVID-19.ipynb
age = df['Age'].value_counts(ascending=True)
age
Execution result
90 or more 1
90s 1
100 2
80s 7
Teen 9
70s 10
60s 12
90 14
50s 25
30s 33
40s 33
80 44
20s 49
10 66
70 69
60 128
40 167
50 179
30 203
20 310
90 1040
Unknown 1145
0-10 1335
80 2645
10 2952
70 3751
60 4531
50 7355
40 8315
30 10551
20 18009
Name:Age, dtype: int64
Since there is no single notation such as 20's and 20's ... I will try to unify the notation using df.replace ().
COVID-19.ipynb
df = df.replace({'Age':{'0-10':'under10','10's':'10','20's':'20', '30s':'30', 'Forties':'40', '50s':'50', '60s':'60', '70s':'70', '80s':'80', '90s':'90' , 'unknown':'unknown', '90 and above':'90~'}})
age2 = df['Age'].value_counts()
age2
Output result
20 18009
30 10551
40 8315
50 7355
60 4531
70 3751
10 2952
80 2645
under10 1335
unknown 1145
90 1040
20 359
30 236
50 204
40 200
60 140
70 79
10 75
80 51
90 15
100 2
90~ 1
Name:Age, dtype: int64
I was able to suppress the output display more than before. (I tried various things because I wanted to get the total with the same numbers, but it didn't work, so I'll leave it as a future task.) It's a little hard to see, so I'll visualize it with a graph.
COVID-19.ipynb
plt.title('Age of infected person')
age2.plot.bar()
Making it a graph makes it easier to understand visually. Looking at this graph, we can see that the younger the generation, such as those in their 20s, 30s, 40s, etc., are more infected. In particular, the large number of infected people in their 20s is obvious.
COVID-19.ipynb
df = df.replace({'sex':{'male':'male', 'Female':'female', 'unknown':'unknown'}})
sex = df['sex'].value_counts()
plt.xlabel('Sex')
plt.ylabel('Number of people')
plt.title('Infected_sex')
#print(sex) #Display when you want to know the detailed number of infected people by gender
sex.plot.bar()
When I checked it in a graph, I found that the number of infected men was higher. I think that infection does not depend on the gender of humans, but I think that the purpose and behavior when going out are different, so if I can know in detail, I expect that the relationship between the number of infections by gender can be determined.
COVID-19.ipynb
fixed_date = df['Fixed date']
fixed_date = collections.Counter(fixed_date)
#fixed_date #Since there is a lot of output, the execution result is omitted.
date = []
value = []
for get_date in fixed_date:
date.append(get_date)
for get_value in fixed_date.values():
value.append(get_value)
plt.plot(date, value)
plt.xticks( [0, 180, 70] )
plt.xticks(rotation=45)
plt.xlabel('date')
plt.ylabel('value')
plt.title('Changes in infected people')
plt.show()
If you check the graph, you can see that positive patients were confirmed from January, and although the number increased sharply around April and temporarily healed, it increased again in July and peaked around August. By graphing, we were able to confirm the second wave of the new coronavirus. Since the end of the graph, the number of confirmed positive patients has decreased sharply, so I'm looking forward to it in the future.
I have X and Y coordinate data in CSV, so I will plot it. This time, I referred to this article.
COVID-19.ipynb
#Install it as it is required to use geopandas
pipenv install geopandas
pipenv install descartes
#Depict the original map data
map_1 = gpd.read_file('./land-master(qiita)/japan.geojson')
map_1.plot(figsize=(10,10), edgecolor='#444', facecolor='white', linewidth = 1);
COVID-19.ipynb
#Try entering the CSV XY coordinates
map_1.plot(figsize=(10,10), edgecolor='#444', facecolor='white', linewidth = 1);
plt.scatter(df['X'],df['Y'])
plt.show()
If you look closely at the plotted points, they are meaningfully gathered in the upper right corner ... so let's expand it.
COVID-19.ipynb
map_1.plot(figsize=(10,10), edgecolor='#444', facecolor='white', linewidth = 1);
plt.xlim([120,150]) #Set the range you want to expand(Any)
plt.ylim([30,46]) #Set the range you want to expand(Any)
plt.scatter(df['X'],df['Y'])
plt.show()
I was able to confirm that the plot was made firmly. You can see from this map that the coronavirus is widespread nationwide. It turned out that there are many infected people in the Kyushu region as a whole, not to mention the Kanto region. It's very scary to think that there may be a risk of infection wherever you go.
I think there are some points that I haven't reached since this is my first post on qiita, but I am very happy that I enjoyed analyzing and creating articles. It's a simple analysis, but I'm very happy because I was able to try something new for myself by plotting the coordinates on a map. In the future, I would like to take on the challenge of deeper corona analysis. It's a difficult time with the coronavirus, but please love yourself.
Recommended Posts