Introduction

I want to create a dashboard that can visualize and analyze data intuitively and publish it as a web application. But html, css and js are troublesome so I don't want to mess with them. In such a case, Streamlit allows you to create a data analysis dashboard with a single Python script without any hassle. In this article

--App creation process (installation, data preparation, dashboard creation) --Procedure for deploying to AWS EC2 instance

I will leave. It's such a time, so as a subject, look at the data that can confirm the traffic of people in the 23 wards of Tokyo published on Yahoo! Data Solution, and the new corona Let's check the impact of the measures.

\ # Especially the latter half is a record that a web beginner tried by groping, so there should be various deficiencies in the content and description. \ # I would appreciate it if you could point out.

App creation

Streamlit installation

Streamlit is OSS. GitHub：https://github.com/streamlit/streamlit

Installation is OK with the following.

pip install streamlit

By executing the following command, you can launch the demo app locally and check it in the browser (the browser will launch automatically).

streamlit hello

For how to use it, the Official Tutorial Page is easy to understand. This commentary article is also available.

[Addition] This is also helpful. -[Python] I wrote a test of "Streamlit" that makes it easy to create a visualization application

Data preparation

It's such a time, so you'll want to see how the new corona measures are affecting the flow of people in Tokyo. When I searched for that kind of data, I found it at Yahoo! Data Solution. "Daily transition of estimated population in the 23 wards of Tokyo (overall, visitors, residents)" data It was released as open data (as of April 10, 2020). This time I will try to visualize this in a nice way.

[Source: Yahoo! Data Solution (https://ds.yahoo.co.jp/report/, 2020/04/09)] The data is updated daily. Here, the data up to April 9, 2020 is used.

The original data is in Excel format and includes the number of residents, visitors, and total number of people in the 23 wards of Tokyo on a daily basis. Since this is saved in a separate sheet for each month, convert it to csv in advance.

Read each of these and combine them into one.

import numpy as np
import pandas as pd

data_02 = pd.read_csv('Tokyo 23 wards transition 0409_February.csv')
data_03 = pd.read_csv('Tokyo 23 wards transition 0409_March.csv')
data_04 = pd.read_csv('Tokyo 23 wards transition 0409_April.csv')

data_all = pd.concat([data_02, data_03.iloc[:, 2:], data_04.iloc[:, 2:]], axis=1)

data_all.head()

	Area	Target classification	February 1st	February 2	February 3	February 4	February 5	February 6	February 7	February 8	...	March 30	March 31	April 1st	April 2	April 3	April 4	April 5	April 6	April 7	April 8
0	The entire 23 wards of Tokyo	whole	10485000	10164000	11676000	11687000	11659000	11690000	11691000	10471000	...	11393000	11388000	11288000	11263000	11256000	10021000	9737000	11212000	11104000	10859000
1	NaN	Resident	8921000	8921000	8921000	8921000	8921000	8921000	8921000	8921000	...	8949000	8949000	8924000	8924000	8924000	8924000	8924000	8924000	8924000	8924000
2	NaN	Visitors	1564000	1243000	2755000	2766000	2738000	2769000	2770000	1550000	...	2444000	2439000	2364000	2339000	2332000	1097000	813000	2288000	2180000	1935000
3	Chiyoda-ku	whole	454900	356900	1028900	1039900	1031900	1043900	1041900	453900	...	857000	855000	819500	802500	791500	266500	195500	775500	731500	624500
4	NaN	Resident	54900	54900	54900	54900	54900	54900	54900	54900	...	56000	56000	55500	55500	55500	55500	55500	55500	55500	55500

Areas and target classifications are organized as MultiIndex and output.

data_all.fillna(method='ffill', inplace=True)
data_all.set_index(['area', 'Target classification'], inplace=True)

data_all.to_csv('tokyo_0409.csv', index=True, header=True)

The final data looks like this.

data_all.head(7)

		February 1st	February 2	February 3	February 4	February 5	February 6	February 7	February 8	February 9	February 10	...	March 30	March 31	April 1st	April 2	April 3	April 4	April 5	April 6	April 7	April 8
Area	Target classification
The entire 23 wards of Tokyo	Overall	10485000	10164000	11676000	11687000	11659000	11690000	11691000	10471000	10149000	11523000	...	11393000	11388000	11288000	11263000	11256000	10021000	9737000	11212000	11104000	10859000
	Resident	8921000	8921000	8921000	8921000	8921000	8921000	8921000	8921000	8921000	8921000	...	8949000	8949000	8924000	8924000	8924000	8924000	8924000	8924000	8924000	8924000
	Visitors	1564000	1243000	2755000	2766000	2738000	2769000	2770000	1550000	1228000	2602000	...	2444000	2439000	2364000	2339000	2332000	1097000	813000	2288000	2180000	1935000
Chiyoda-ku	Overall	454900	356900	1028900	1039900	1031900	1043900	1041900	453900	356900	958900	...	857000	855000	819500	802500	791500	266500	195500	775500	731500	624500
	Resident	54900	54900	54900	54900	54900	54900	54900	54900	54900	54900	...	56000	56000	55500	55500	55500	55500	55500	55500	55500	55500
	Visitors	400000	302000	974000	985000	977000	989000	987000	399000	302000	904000	...	801000	799000	764000	747000	736000	211000	140000	720000	676000	569000
Chuo-ku	Overall	441000	367000	849000	857000	852000	861000	863000	440000	370000	793000	...	733000	728000	701000	691000	684000	307000	256000	675000	641000	563000

Dashboard creation

From here, we will actually create the dashboard. Streamlit allows you to create dashboards with just one Python script. This time I will write the script as streamlit_app.py.

The contents of the data are displayed in a line chart. Specify the target area in the select box, and make it a specification that allows you to see the time-series transition of the number of residents, visitors, and the total number of the area.

`streamlit_app.py`


import numpy as np
import pandas as pd
import streamlit as st
import plotly.graph_objects as go

st.title('Daily transition of estimated population in the 23 wards of Tokyo')
st.write('[Source: Yahoo! Data Solution]')

data_all = pd.read_csv('data/tokyo_0409.csv')
erea_list = data_all['area'].unique()

data_all.set_index(['area', 'Target classification'], inplace=True)

#Change the value to vertical holding
data_all = data_all.T
#Convert date to datetime type
data_all.index = map(lambda x: '2020'+x, data_all.index)
data_all.index = pd.to_datetime(data_all.index, format='%Y year%m month%d day')
data_all.index.name = 'time'

#Select the display area with the select box
selected_erea = st.sidebar.selectbox(
    'Select the area to display:',
    erea_list
)

#graph display
st.write(f'##Displaying:{selected_erea}')
data_plotly = data_all[(selected_erea)]
data_plot = [
    go.Scatter(x=data_plotly.index,
               y=data_plotly['Resident'],
               mode='lines',
               name='Resident'),
    go.Scatter(x=data_plotly.index,
               y=data_plotly['Visitor'],
               mode='lines',
               name='Visitor'),
    go.Scatter(x=data_plotly.index,
               y=data_plotly['The entire'],
               mode='lines',
               name='The entire')]
layout = go.Layout(
    xaxis={"title": "date"},
    yaxis={"title": "Number of people"}
)
st.plotly_chart(go.Figure(data=data_plot, layout=layout))

Use the st.write () method of the streamlit module, etc. We will define the character strings, tables, and graphs to be displayed on the screen.

Also, as an interactive process, st.selectbox () is used to display the target area options. By having the value selected by the user in selected_erea as the return value, The information of the corresponding area is drawn in a graph. You can place elements in the sidebar on the left side of the screen with st.sidebar, and it looks a little nice.

For graph display, simple methods such as st.line_chart () are provided, but I didn't seem to be able to handle dates well, so This time I'm using st.plotly_chart () to create an interactive graph in Plotly and draw it.

Another characteristic feature of Streamlit is You can plot data on the map with st.map () You can display a progress bar for time-consuming processes with st.progress (). I wanted to use this area as well, but this time I will omit it.

It's convenient to be able to create a screen with just a short Python script without being aware of html at all.

Deploy to AWS

Assumed usage, you can quickly check the data at hand, share the results with the team, etc. I think it's about that, but since it's a big deal, I will deploy it to an AWS EC2 instance and publish it as a test.

procedure: 1: Create a t2.micro instance of AWS EC2 free tier 2: Run the app in your instance

streamlit run streamlit_app.py

3: Domain acquisition This time, I got an appropriate domain name ʻonedata.ml from the free domain service [freenom](https://www.freenom.com/ja/index.html). (You can get ".tk", ".ml", ".ga", ".cf", ".gq" domains for free) 4: Edit streamlit config file In the configuration file (config.toml) in the ~ / .streamlit` folder, Enter the domain name you obtained as the access address.

[browser]
gatherUsageStats = false
serverAddress = "onedata.ml"

[server]
port = 8501

5: Port forwarding from 80 to 8501 By default, streamlit accepts communication on port 8501. I want to be able to access from the default port 80 in order to access without specifying the port by domain name. To do this, you need root privileges when running the app. Therefore, here, we will deal with it by playing with iptables and forwarding the access to port 80 to port 8501.

sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 8501

6: Opening port 80 Edit Security Group> Inbound Rules on the EC2 dashboard and leave port 80 open.

Here is the app I made ↓ http://onedata.ml

Implementation ↓ https://github.com/tkmz-n/streamlit_app

Summary

Create a simple demo app to visualize open data using Streamlit I tried to deploy it on AWS and publish it.

Looking at the data, we can see that the traffic of people in the 23 wards of Tokyo has been gradually decreasing since the end of March. Let's continue to stay at home with the aim of converging the new Corona as soon as possible.

Quickly create a Python data analysis dashboard with Streamlit and deploy it to AWS