I want to create a dashboard that can visualize and analyze data intuitively and publish it as a web application. But html, css and js are troublesome so I don't want to mess with them. In such a case, Streamlit allows you to create a data analysis dashboard with a single Python script without any hassle. In this article
--App creation process (installation, data preparation, dashboard creation) --Procedure for deploying to AWS EC2 instance
I will leave. It's such a time, so as a subject, look at the data that can confirm the traffic of people in the 23 wards of Tokyo published on Yahoo! Data Solution, and the new corona Let's check the impact of the measures.
\ # Especially the latter half is a record that a web beginner tried by groping, so there should be various deficiencies in the content and description. \ # I would appreciate it if you could point out.
Streamlit is OSS. GitHub:https://github.com/streamlit/streamlit
Installation is OK with the following.
pip install streamlit
By executing the following command, you can launch the demo app locally and check it in the browser (the browser will launch automatically).
streamlit hello
For how to use it, the Official Tutorial Page is easy to understand. This commentary article is also available.
[Addition] This is also helpful. -[Python] I wrote a test of "Streamlit" that makes it easy to create a visualization application
It's such a time, so you'll want to see how the new corona measures are affecting the flow of people in Tokyo. When I searched for that kind of data, I found it at Yahoo! Data Solution. "Daily transition of estimated population in the 23 wards of Tokyo (overall, visitors, residents)" data It was released as open data (as of April 10, 2020). This time I will try to visualize this in a nice way.
[Source: Yahoo! Data Solution (https://ds.yahoo.co.jp/report/, 2020/04/09)] The data is updated daily. Here, the data up to April 9, 2020 is used.
The original data is in Excel format and includes the number of residents, visitors, and total number of people in the 23 wards of Tokyo on a daily basis. Since this is saved in a separate sheet for each month, convert it to csv in advance.
Read each of these and combine them into one.
import numpy as np
import pandas as pd
data_02 = pd.read_csv('Tokyo 23 wards transition 0409_February.csv')
data_03 = pd.read_csv('Tokyo 23 wards transition 0409_March.csv')
data_04 = pd.read_csv('Tokyo 23 wards transition 0409_April.csv')
data_all = pd.concat([data_02, data_03.iloc[:, 2:], data_04.iloc[:, 2:]], axis=1)
data_all.head()
Area th> | Target classification th> | February 1st th> | February 2 th> | February 3 th> | February 4 th> | February 5 th> | February 6 th> | February 7 th> | February 8 th> | ... | March 30 th> | March 31 th> | April 1st th> | April 2 th> | April 3 th> | April 4 th> | April 5 th> | April 6 th> | April 7 th> | April 8 th> | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | The entire 23 wards of Tokyo td> | whole td> | 10485000 | 10164000 | 11676000 | 11687000 | 11659000 | 11690000 | 11691000 | 10471000 | ... | 11393000 | 11388000 | 11288000 | 11263000 | 11256000 | 10021000 | 9737000 | 11212000 | 11104000 | 10859000 |
1 | NaN | Resident td> | 8921000 | 8921000 | 8921000 | 8921000 | 8921000 | 8921000 | 8921000 | 8921000 | ... | 8949000 | 8949000 | 8924000 | 8924000 | 8924000 | 8924000 | 8924000 | 8924000 | 8924000 | 8924000 |
2 | NaN | Visitors td> | 1564000 | 1243000 | 2755000 | 2766000 | 2738000 | 2769000 | 2770000 | 1550000 | ... | 2444000 | 2439000 | 2364000 | 2339000 | 2332000 | 1097000 | 813000 | 2288000 | 2180000 | 1935000 |
3 | Chiyoda-ku td> | whole td> | 454900 | 356900 | 1028900 | 1039900 | 1031900 | 1043900 | 1041900 | 453900 | ... | 857000 | 855000 | 819500 | 802500 | 791500 | 266500 | 195500 | 775500 | 731500 | 624500 |
4 | NaN | Resident td> | 54900 | 54900 | 54900 | 54900 | 54900 | 54900 | 54900 | 54900 | ... | 56000 | 56000 | 55500 | 55500 | 55500 | 55500 | 55500 | 55500 | 55500 | 55500 |
Areas and target classifications are organized as MultiIndex and output.
data_all.fillna(method='ffill', inplace=True)
data_all.set_index(['area', 'Target classification'], inplace=True)
data_all.to_csv('tokyo_0409.csv', index=True, header=True)
The final data looks like this.
data_all.head(7)
February 1st th> | February 2 th> | February 3 th> | February 4 th> | February 5 th> | February 6 th> | February 7 th> | February 8 th> | February 9 th> | February 10 th> | ... | March 30 th> | March 31 th> | April 1st th> | April 2 th> | April 3 th> | April 4 th> | April 5 th> | April 6 th> | April 7 th> | April 8 th> | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Area th> | Target classification th> | |||||||||||||||||||||
The entire 23 wards of Tokyo th> | Overall th> | 10485000 | 10164000 | 11676000 | 11687000 | 11659000 | 11690000 | 11691000 | 10471000 | 10149000 | 11523000 | ... | 11393000 | 11388000 | 11288000 | 11263000 | 11256000 | 10021000 | 9737000 | 11212000 | 11104000 | 10859000 |
Resident th> | 8921000 | 8921000 | 8921000 | 8921000 | 8921000 | 8921000 | 8921000 | 8921000 | 8921000 | 8921000 | ... | 8949000 | 8949000 | 8924000 | 8924000 | 8924000 | 8924000 | 8924000 | 8924000 | 8924000 | 8924000 | |
Visitors th> | 1564000 | 1243000 | 2755000 | 2766000 | 2738000 | 2769000 | 2770000 | 1550000 | 1228000 | 2602000 | ... | 2444000 | 2439000 | 2364000 | 2339000 | 2332000 | 1097000 | 813000 | 2288000 | 2180000 | 1935000 | |
Chiyoda-ku th> | Overall th> | 454900 | 356900 | 1028900 | 1039900 | 1031900 | 1043900 | 1041900 | 453900 | 356900 | 958900 | ... | 857000 | 855000 | 819500 | 802500 | 791500 | 266500 | 195500 | 775500 | 731500 | 624500 |
Resident th> | 54900 | 54900 | 54900 | 54900 | 54900 | 54900 | 54900 | 54900 | 54900 | 54900 | ... | 56000 | 56000 | 55500 | 55500 | 55500 | 55500 | 55500 | 55500 | 55500 | 55500 | |
Visitors th> | 400000 | 302000 | 974000 | 985000 | 977000 | 989000 | 987000 | 399000 | 302000 | 904000 | ... | 801000 | 799000 | 764000 | 747000 | 736000 | 211000 | 140000 | 720000 | 676000 | 569000 | |
Chuo-ku th> | Overall th> | 441000 | 367000 | 849000 | 857000 | 852000 | 861000 | 863000 | 440000 | 370000 | 793000 | ... | 733000 | 728000 | 701000 | 691000 | 684000 | 307000 | 256000 | 675000 | 641000 | 563000 |
From here, we will actually create the dashboard.
Streamlit allows you to create dashboards with just one Python script.
This time I will write the script as streamlit_app.py
.
The contents of the data are displayed in a line chart. Specify the target area in the select box, and make it a specification that allows you to see the time-series transition of the number of residents, visitors, and the total number of the area.
streamlit_app.py
import numpy as np
import pandas as pd
import streamlit as st
import plotly.graph_objects as go
st.title('Daily transition of estimated population in the 23 wards of Tokyo')
st.write('[Source: Yahoo! Data Solution]')
data_all = pd.read_csv('data/tokyo_0409.csv')
erea_list = data_all['area'].unique()
data_all.set_index(['area', 'Target classification'], inplace=True)
#Change the value to vertical holding
data_all = data_all.T
#Convert date to datetime type
data_all.index = map(lambda x: '2020'+x, data_all.index)
data_all.index = pd.to_datetime(data_all.index, format='%Y year%m month%d day')
data_all.index.name = 'time'
#Select the display area with the select box
selected_erea = st.sidebar.selectbox(
'Select the area to display:',
erea_list
)
#graph display
st.write(f'##Displaying:{selected_erea}')
data_plotly = data_all[(selected_erea)]
data_plot = [
go.Scatter(x=data_plotly.index,
y=data_plotly['Resident'],
mode='lines',
name='Resident'),
go.Scatter(x=data_plotly.index,
y=data_plotly['Visitor'],
mode='lines',
name='Visitor'),
go.Scatter(x=data_plotly.index,
y=data_plotly['The entire'],
mode='lines',
name='The entire')]
layout = go.Layout(
xaxis={"title": "date"},
yaxis={"title": "Number of people"}
)
st.plotly_chart(go.Figure(data=data_plot, layout=layout))
Use the st.write ()
method of the streamlit
module, etc.
We will define the character strings, tables, and graphs to be displayed on the screen.
Also, as an interactive process, st.selectbox ()
is used to display the target area options.
By having the value selected by the user in selected_erea
as the return value,
The information of the corresponding area is drawn in a graph.
You can place elements in the sidebar on the left side of the screen with st.sidebar
, and it looks a little nice.
For graph display, simple methods such as st.line_chart ()
are provided, but
I didn't seem to be able to handle dates well, so
This time I'm using st.plotly_chart ()
to create an interactive graph in Plotly and draw it.
Another characteristic feature of Streamlit is
You can plot data on the map with st.map ()
You can display a progress bar for time-consuming processes with st.progress ()
.
I wanted to use this area as well, but this time I will omit it.
It's convenient to be able to create a screen with just a short Python script without being aware of html at all.
Assumed usage, you can quickly check the data at hand, share the results with the team, etc. I think it's about that, but since it's a big deal, I will deploy it to an AWS EC2 instance and publish it as a test.
procedure: 1: Create a t2.micro instance of AWS EC2 free tier 2: Run the app in your instance
streamlit run streamlit_app.py
3: Domain acquisition
This time, I got an appropriate domain name ʻonedata.ml from the free domain service [freenom](https://www.freenom.com/ja/index.html). (You can get ".tk", ".ml", ".ga", ".cf", ".gq" domains for free) 4: Edit streamlit config file In the configuration file (
config.toml) in the
~ / .streamlit` folder,
Enter the domain name you obtained as the access address.
[browser]
gatherUsageStats = false
serverAddress = "onedata.ml"
[server]
port = 8501
5: Port forwarding from 80 to 8501 By default, streamlit accepts communication on port 8501. I want to be able to access from the default port 80 in order to access without specifying the port by domain name. To do this, you need root privileges when running the app. Therefore, here, we will deal with it by playing with iptables and forwarding the access to port 80 to port 8501.
sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 8501
6: Opening port 80 Edit Security Group> Inbound Rules on the EC2 dashboard and leave port 80 open.
Here is the app I made ↓ http://onedata.ml
Implementation ↓ https://github.com/tkmz-n/streamlit_app
Create a simple demo app to visualize open data using Streamlit I tried to deploy it on AWS and publish it.
Looking at the data, we can see that the traffic of people in the 23 wards of Tokyo has been gradually decreasing since the end of March. Let's continue to stay at home with the aim of converging the new Corona as soon as possible.
Recommended Posts