Dash (https://dash.plotly.com/) is a framework for data visualization in python.
I thought this would be available on Google Colab, but I found a good article that has already been explained. https://qiita.com/OgawaHideyuki/items/725f4ffd93ffb0d30b6c
So, this article is a record that I used it as a reference and moved my hand. The theme is to easily visualize the number of people infected with corona in each country. Let's display a map and a time series graph.
** 2020-12-27 Addendum: ** Added map display.
Use Google Colaboratory.
Since the map scatter plot display uses Mapbox tokens and Google drive, the notebook is divided into two parts, chronological and map.
First, install the package for using Dash from your Google Colab/Jupyter notebook.
! pip install jupyter_dash
! pip install --upgrade plotly
Import the packages associated with Dash.
import dash
from jupyter_dash import JupyterDash
import dash_core_components as dcc
import dash_html_components as html
import plotly.express as px
from dash.dependencies import Input, Output
Get corona infection data from GitHub. For the data, please refer to the following page. https://dodotechno.com/covd-19-visualization/
! wget https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
Load the downloaded csv into the data frame.
import pandas as pd
df = pd.read_csv("time_series_covid19_confirmed_global.csv")
Aggregate by region, delete latitude and longitude, transpose to column by country, and add date column.
df = df.groupby(['Country/Region'], as_index=False).sum()
df.drop(["Lat","Long"], axis=1,inplace=True)
df = df.T
df.columns = df.iloc[0]
df = df[1:]
df.reset_index(inplace=True)
df.rename(columns={'index': 'date'},inplace=True)
df
The resulting table looks like this:
First, let's graph the number of infected people in Japan. The horizontal axis is the date and the vertical axis is the number of infected people, and you can see an increase that seems to be the first wave (late April), the second wave (early August), and the third wave (November).
px.line(df, x="date", y="Japan")
Next, let's make any country selectable in the dropdown. You can select it on your notebook, so give it a try. If the runtime is stopped, try selecting "Runtime"-> "Run All Cells" from the menu. https://colab.research.google.com/drive/1fUP4818fSsFFFlUHlLGNoTxq8uoL2VAu#scrollTo=Kr-FsvLIpCoN&line=1&uniqifier=1
app = JupyterDash(__name__)
app.layout = html.Div([
dcc.Dropdown(id="my_dropdown",
options=[{"value": country, "label": country} for country in df.columns.unique()],
value=["Japan"],
multi=True
),
dcc.Graph(id="my_graph")
])
@app.callback(Output("my_graph", "figure"), Input("my_dropdown", "value"))
def update_graph(selected_country):
return px.line(df, x="date", y=selected_country)
app.run_server(mode="inline")
From the dropdown, select Japan and Canada to display. It seems that Canada is also on the rise.
Next, when I add the United States, it's not really comparable to Japan. .. After all there is an impact when looking at the graph. I want the vaccine to be effective (though not for each person).
Let's display a scatter plot on the map using the same data.
We use a map service called Mapbox. An access token is required to use it. If you do not have an account, sign up below to get an access token. https://account.mapbox.com/ At a minimum, you only need your ID, password, and email address.
This time we will store the token in Google drive. If you want to run it yourself, you can also embed it in your code as a string.
Here, as an example, upload the text file mapbox-token.txt
with the contents of the Mapbox token pasted directly under My Drive on Google drive.
Mount Google drive and load the Mapbox token. The OAuth token at the time of mounting is displayed when you jump to the page of the URL displayed at the time of execution, so enter it by copy and paste.
from google.colab import drive
drive.mount('/content/drive')
f = open('/content/drive/My Drive/mapbox-token.txt', 'r')
MAPBOX_TOKEN = f.read()
f.close()
Importing Jupyter dash is similar to a time series graph.
! pip install jupyter_dash
! pip install --upgrade plotly
import dash
from jupyter_dash import JupyterDash
import dash_core_components as dcc
import dash_html_components as html
import plotly.express as px
from dash.dependencies import Input, Output
Corona infected person data is also acquired.
! wget https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
Load the downloaded csv into the data frame.
df_map = pd.read_csv("./time_series_covid19_confirmed_global.csv")
df_map
This time, we will only collect data for each region, leaving the latitude and longitude and not transposing.
df_map = df_map.groupby(['Country/Region'], as_index=False).sum()
df_map
Now that we are ready, we will display it on a map.
The date and color specification criteria are selectable and displayed. I refer to this. https://qiita.com/banquet_kuma/items/e02ba60661cf91af37de
MAPBOX_TOKEN
is the value read from the file uploaded to Google drive earlier. If you want to execute it at hand, you can paste it as it is as a character string.app = JupyterDash(__name__)
color_opt = [dict(label=x, value=x) for x in df_map.columns]
del color_opt[2]
del color_opt[1]
date_opt = color_opt.copy()
del date_opt[0]
app.layout = html.Div(
[
html.Div(
[
html.P(["date:", dcc.Dropdown(id='date', options=date_opt)]),
html.P(["color:", dcc.Dropdown(id='color', options=color_opt)]),
],
style={"width": "20%", "float": "left"}
),
dcc.Graph(id="graph", style={"width": "80%", "display": "inline-block"}),
]
)
@app.callback(Output("graph", "figure"), [Input("date", "value"), Input("color", "value")])
def update_graph(date, color):
px.set_mapbox_access_token(MAPBOX_TOKEN)
if not color:
color = date
return px.scatter_mapbox(df_map,
lat="Lat",
lon="Long",
color=color,
size=date,
size_max=20,
zoom=0,
center={'lat': 35, 'lon': 135},
title="Number of people infected with corona in each country",
color_continuous_scale=px.colors.diverging.BrBG,
hover_name=date)
app.run_server(mode="inline")
When executed, it looks like this. The display is beautiful.
Select a date to see the cumulative number of infected people at that time. 2020-12-01 looks like this.
The display is done by a function called plotly.express.scatter_mapbox
.
I couldn't find a Japanese explanation for the parameters of plotly.express.scatter_mapbox
, so I'll give a brief explanation for your reference (if you know a good source, please let me know in the comments).
https://plotly.github.io/plotly.py-docs/generated/plotly.express.scatter_mapbox.html
plotly.express.colors
Especially
plotly.express.colors.qualitative` has a useful color sequence.plotly.express.colors
Especially available in plotly.express.colors.sequential, plotly.express.colors.diverging and plotly.express.colors.cyclical
.plotly.express.colors.diverging
.'open-street-map','white-bg','carto-positron',' carto-darkmatter','stamen-terrain','stamen-toner','stamen-watercolor'
do not require tokens is.'basic','streets','outdoors','light','dark','satellite','satellite- streets'
require tokens.There are many. ..
Here are some issues.
It's easy to implement with a little visualization, and it's easy to publish on the Internet with Google Colab. Processing csv may be the most troublesome.
I would like to add visualization on the map. (→ 2020-12-27 Map display added, there are remaining issues) It seems that we can still visualize various things, so we may do it soon.
I found out by running it, but the graph does not remain when Google Colab is stopped, so if you need to publish it, you may want to use other methods (matplotlib, plotly, etc.) for Heroku or notebooks. Maybe.
Recommended Posts