Introduction

On January 16, 2020, a new type of coronavirus infection (disease name) caused by SARS-CoV-2 (virus name) was confirmed for the first time in Japan. Unfortunately, the disease has killed many people, from ordinary people to celebrities. Even now, more than half a year after that, the epidemic has not subsided, and masks are a necessity when going out. In this post, we have briefly analyzed and summarized the coronavirus in Japan. I hoped that this analysis would give me some awareness and improve my analysis skills.

Data preparation

In analyzing the coronavirus this time, we used the CSV data published by Jag Japan Co., Ltd.. Thank you very much. I will post the link below.

About "Map of the number of people infected with the new coronavirus" スクリーンショット 2020-08-30 12.56.46.png

environment

python3
JupyterLab

Try to analyze

1. Import the required libraries

`COVID-19.ipynb`


import collections
import matplotlib.pyplot as plt
import pandas as pd

2. Read the CSV file

`COVID-19.ipynb`


pd.set_option('display.max_columns', None)
df = pd.read_csv('COVID-19.csv')
df

In JupyterLab, if there are many columns, the display will be omitted, so display everything in the first line.

3. Check the age of the infected person

`COVID-19.ipynb`


age = df['Age'].value_counts(ascending=True)
age

`Execution result`


90 or more 1
90s 1
100         2
80s 7
Teen 9
70s 10
60s 12
90         14
50s 25
30s 33
40s 33
80         44
20s 49
10         66
70         69
60        128
40        167
50        179
30        203
20        310
90       1040
Unknown 1145
0-10     1335
80       2645
10       2952
70       3751
60       4531
50       7355
40       8315
30      10551
20      18009
Name:Age, dtype: int64

Since there is no single notation such as 20's and 20's ... I will try to unify the notation using df.replace ().

`COVID-19.ipynb`


df = df.replace({'Age':{'0-10':'under10','10's':'10','20's':'20', '30s':'30', 'Forties':'40', '50s':'50', '60s':'60', '70s':'70', '80s':'80', '90s':'90' , 'unknown':'unknown', '90 and above':'90~'}})
age2 = df['Age'].value_counts()
age2

`Output result`


20              18009
30              10551
40               8315
50               7355
60               4531
70               3751
10               2952
80               2645
under10     1335
unknown          1145
90               1040
20                359
30                236
50                204
40                200
60                140
70                 79
10                 75
80                 51
90                 15
100                 2
90~                 1
Name:Age, dtype: int64

I was able to suppress the output display more than before. (I tried various things because I wanted to get the total with the same numbers, but it didn't work, so I'll leave it as a future task.) It's a little hard to see, so I'll visualize it with a graph.

`COVID-19.ipynb`


plt.title('Age of infected person')
age2.plot.bar()

Age of infected person.png Making it a graph makes it easier to understand visually. Looking at this graph, we can see that the younger the generation, such as those in their 20s, 30s, 40s, etc., are more infected. In particular, the large number of infected people in their 20s is obvious.

4. Check the number of infected people by gender

`COVID-19.ipynb`


df = df.replace({'sex':{'male':'male', 'Female':'female', 'unknown':'unknown'}})
sex = df['sex'].value_counts()

plt.xlabel('Sex')
plt.ylabel('Number of people')
plt.title('Infected_sex')

#print(sex) #Display when you want to know the detailed number of infected people by gender
sex.plot.bar()

When I checked it in a graph, I found that the number of infected men was higher. I think that infection does not depend on the gender of humans, but I think that the purpose and behavior when going out are different, so if I can know in detail, I expect that the relationship between the number of infections by gender can be determined.

5. Check the increase / decrease of positive reaction

`COVID-19.ipynb`


fixed_date = df['Fixed date']
fixed_date = collections.Counter(fixed_date)
#fixed_date #Since there is a lot of output, the execution result is omitted.

date = []
value = []

for get_date in fixed_date:
    date.append(get_date)
for get_value in fixed_date.values():
    value.append(get_value)

plt.plot(date, value)
plt.xticks( [0, 180, 70] )
plt.xticks(rotation=45)

plt.xlabel('date')
plt.ylabel('value')
plt.title('Changes in infected people')

plt.show()

Changes in infected people.png If you check the graph, you can see that positive patients were confirmed from January, and although the number increased sharply around April and temporarily healed, it increased again in July and peaked around August. By graphing, we were able to confirm the second wave of the new coronavirus. Since the end of the graph, the number of confirmed positive patients has decreased sharply, so I'm looking forward to it in the future.

6. Plot the locations where corona infection was confirmed on the map

I have X and Y coordinate data in CSV, so I will plot it. This time, I referred to this article.

`COVID-19.ipynb`


#Install it as it is required to use geopandas
pipenv install geopandas
pipenv install descartes

#Depict the original map data
map_1 = gpd.read_file('./land-master(qiita)/japan.geojson')
map_1.plot(figsize=(10,10), edgecolor='#444', facecolor='white', linewidth = 1);

`COVID-19.ipynb`


#Try entering the CSV XY coordinates
map_1.plot(figsize=(10,10), edgecolor='#444', facecolor='white', linewidth = 1);
plt.scatter(df['X'],df['Y'])
plt.show()

スクリーンショット 2020-08-30 15.49.01.png If you look closely at the plotted points, they are meaningfully gathered in the upper right corner ... so let's expand it.

`COVID-19.ipynb`


map_1.plot(figsize=(10,10), edgecolor='#444', facecolor='white', linewidth = 1);
plt.xlim([120,150]) #Set the range you want to expand(Any)
plt.ylim([30,46]) #Set the range you want to expand(Any)
plt.scatter(df['X'],df['Y'])
plt.show()

スクリーンショット 2020-08-30 15.52.15.png I was able to confirm that the plot was made firmly. You can see from this map that the coronavirus is widespread nationwide. It turned out that there are many infected people in the Kyushu region as a whole, not to mention the Kanto region. It's very scary to think that there may be a risk of infection wherever you go.

7. Summary

I think there are some points that I haven't reached since this is my first post on qiita, but I am very happy that I enjoyed analyzing and creating articles. It's a simple analysis, but I'm very happy because I was able to try something new for myself by plotting the coordinates on a map. In the future, I would like to take on the challenge of deeper corona analysis. It's a difficult time with the coronavirus, but please love yourself.

(Now) I analyzed the new coronavirus (COVID-19)

Introduction

Data preparation

environment

Try to analyze

1. Import the required libraries

COVID-19.ipynb

2. Read the CSV file

COVID-19.ipynb

3. Check the age of the infected person

COVID-19.ipynb

Execution result

COVID-19.ipynb

Output result

COVID-19.ipynb

4. Check the number of infected people by gender

COVID-19.ipynb

5. Check the increase / decrease of positive reaction

COVID-19.ipynb

6. Plot the locations where corona infection was confirmed on the map

COVID-19.ipynb

COVID-19.ipynb

COVID-19.ipynb

7. Summary

`COVID-19.ipynb`

`COVID-19.ipynb`

`COVID-19.ipynb`

`Execution result`

`COVID-19.ipynb`

`Output result`

`COVID-19.ipynb`

`COVID-19.ipynb`

`COVID-19.ipynb`

`COVID-19.ipynb`

`COVID-19.ipynb`

`COVID-19.ipynb`