Quickly create a Python data analysis dashboard with Streamlit and deploy it to AWS

Introduction

I want to create a dashboard that can visualize and analyze data intuitively and publish it as a web application. But html, css and js are troublesome so I don't want to mess with them. In such a case, Streamlit allows you to create a data analysis dashboard with a single Python script without any hassle. In this article

--App creation process (installation, data preparation, dashboard creation) --Procedure for deploying to AWS EC2 instance

I will leave. It's such a time, so as a subject, look at the data that can confirm the traffic of people in the 23 wards of Tokyo published on Yahoo! Data Solution, and the new corona Let's check the impact of the measures.

\ # Especially the latter half is a record that a web beginner tried by groping, so there should be various deficiencies in the content and description. \ # I would appreciate it if you could point out.

App creation

Streamlit installation

Streamlit is OSS. GitHub:https://github.com/streamlit/streamlit

Installation is OK with the following.

pip install streamlit

By executing the following command, you can launch the demo app locally and check it in the browser (the browser will launch automatically).

streamlit hello

image.png

For how to use it, the Official Tutorial Page is easy to understand. This commentary article is also available.

[Addition] This is also helpful. -[Python] I wrote a test of "Streamlit" that makes it easy to create a visualization application

Data preparation

It's such a time, so you'll want to see how the new corona measures are affecting the flow of people in Tokyo. When I searched for that kind of data, I found it at Yahoo! Data Solution. "Daily transition of estimated population in the 23 wards of Tokyo (overall, visitors, residents)" data It was released as open data (as of April 10, 2020). This time I will try to visualize this in a nice way.

[Source: Yahoo! Data Solution (https://ds.yahoo.co.jp/report/, 2020/04/09)] The data is updated daily. Here, the data up to April 9, 2020 is used.

The original data is in Excel format and includes the number of residents, visitors, and total number of people in the 23 wards of Tokyo on a daily basis. Since this is saved in a separate sheet for each month, convert it to csv in advance.

Read each of these and combine them into one.

import numpy as np
import pandas as pd

data_02 = pd.read_csv('Tokyo 23 wards transition 0409_February.csv')
data_03 = pd.read_csv('Tokyo 23 wards transition 0409_March.csv')
data_04 = pd.read_csv('Tokyo 23 wards transition 0409_April.csv')

data_all = pd.concat([data_02, data_03.iloc[:, 2:], data_04.iloc[:, 2:]], axis=1)
data_all.head()
Area Target classification February 1st February 2 February 3 February 4 February 5 February 6 February 7 February 8 ... March 30 March 31 April 1st April 2 April 3 April 4 April 5 April 6 April 7 April 8
0 The entire 23 wards of Tokyo whole 10485000 10164000 11676000 11687000 11659000 11690000 11691000 10471000 ... 11393000 11388000 11288000 11263000 11256000 10021000 9737000 11212000 11104000 10859000
1 NaN Resident 8921000 8921000 8921000 8921000 8921000 8921000 8921000 8921000 ... 8949000 8949000 8924000 8924000 8924000 8924000 8924000 8924000 8924000 8924000
2 NaN Visitors 1564000 1243000 2755000 2766000 2738000 2769000 2770000 1550000 ... 2444000 2439000 2364000 2339000 2332000 1097000 813000 2288000 2180000 1935000
3 Chiyoda-ku whole 454900 356900 1028900 1039900 1031900 1043900 1041900 453900 ... 857000 855000 819500 802500 791500 266500 195500 775500 731500 624500
4 NaN Resident 54900 54900 54900 54900 54900 54900 54900 54900 ... 56000 56000 55500 55500 55500 55500 55500 55500 55500 55500

Areas and target classifications are organized as MultiIndex and output.

data_all.fillna(method='ffill', inplace=True)
data_all.set_index(['area', 'Target classification'], inplace=True)

data_all.to_csv('tokyo_0409.csv', index=True, header=True)

The final data looks like this.

data_all.head(7)
February 1st February 2 February 3 February 4 February 5 February 6 February 7 February 8 February 9 February 10 ... March 30 March 31 April 1st April 2 April 3 April 4 April 5 April 6 April 7 April 8
Area Target classification
The entire 23 wards of Tokyo Overall 10485000 10164000 11676000 11687000 11659000 11690000 11691000 10471000 10149000 11523000 ... 11393000 11388000 11288000 11263000 11256000 10021000 9737000 11212000 11104000 10859000
Resident 8921000 8921000 8921000 8921000 8921000 8921000 8921000 8921000 8921000 8921000 ... 8949000 8949000 8924000 8924000 8924000 8924000 8924000 8924000 8924000 8924000
Visitors 1564000 1243000 2755000 2766000 2738000 2769000 2770000 1550000 1228000 2602000 ... 2444000 2439000 2364000 2339000 2332000 1097000 813000 2288000 2180000 1935000
Chiyoda-ku Overall 454900 356900 1028900 1039900 1031900 1043900 1041900 453900 356900 958900 ... 857000 855000 819500 802500 791500 266500 195500 775500 731500 624500
Resident 54900 54900 54900 54900 54900 54900 54900 54900 54900 54900 ... 56000 56000 55500 55500 55500 55500 55500 55500 55500 55500
Visitors 400000 302000 974000 985000 977000 989000 987000 399000 302000 904000 ... 801000 799000 764000 747000 736000 211000 140000 720000 676000 569000
Chuo-ku Overall 441000 367000 849000 857000 852000 861000 863000 440000 370000 793000 ... 733000 728000 701000 691000 684000 307000 256000 675000 641000 563000

Dashboard creation

From here, we will actually create the dashboard. Streamlit allows you to create dashboards with just one Python script. This time I will write the script as streamlit_app.py.

The contents of the data are displayed in a line chart. Specify the target area in the select box, and make it a specification that allows you to see the time-series transition of the number of residents, visitors, and the total number of the area.

streamlit_app.py


import numpy as np
import pandas as pd
import streamlit as st
import plotly.graph_objects as go

st.title('Daily transition of estimated population in the 23 wards of Tokyo')
st.write('[Source: Yahoo! Data Solution]')

data_all = pd.read_csv('data/tokyo_0409.csv')
erea_list = data_all['area'].unique()

data_all.set_index(['area', 'Target classification'], inplace=True)

#Change the value to vertical holding
data_all = data_all.T
#Convert date to datetime type
data_all.index = map(lambda x: '2020'+x, data_all.index)
data_all.index = pd.to_datetime(data_all.index, format='%Y year%m month%d day')
data_all.index.name = 'time'

#Select the display area with the select box
selected_erea = st.sidebar.selectbox(
    'Select the area to display:',
    erea_list
)

#graph display
st.write(f'##Displaying:{selected_erea}')
data_plotly = data_all[(selected_erea)]
data_plot = [
    go.Scatter(x=data_plotly.index,
               y=data_plotly['Resident'],
               mode='lines',
               name='Resident'),
    go.Scatter(x=data_plotly.index,
               y=data_plotly['Visitor'],
               mode='lines',
               name='Visitor'),
    go.Scatter(x=data_plotly.index,
               y=data_plotly['The entire'],
               mode='lines',
               name='The entire')]
layout = go.Layout(
    xaxis={"title": "date"},
    yaxis={"title": "Number of people"}
)
st.plotly_chart(go.Figure(data=data_plot, layout=layout))


image.png

Use the st.write () method of the streamlit module, etc. We will define the character strings, tables, and graphs to be displayed on the screen.

Also, as an interactive process, st.selectbox () is used to display the target area options. By having the value selected by the user in selected_erea as the return value, The information of the corresponding area is drawn in a graph. You can place elements in the sidebar on the left side of the screen with st.sidebar, and it looks a little nice.

For graph display, simple methods such as st.line_chart () are provided, but I didn't seem to be able to handle dates well, so This time I'm using st.plotly_chart () to create an interactive graph in Plotly and draw it.

Another characteristic feature of Streamlit is You can plot data on the map with st.map () You can display a progress bar for time-consuming processes with st.progress (). I wanted to use this area as well, but this time I will omit it.

It's convenient to be able to create a screen with just a short Python script without being aware of html at all.

Deploy to AWS

Assumed usage, you can quickly check the data at hand, share the results with the team, etc. I think it's about that, but since it's a big deal, I will deploy it to an AWS EC2 instance and publish it as a test.

procedure: 1: Create a t2.micro instance of AWS EC2 free tier 2: Run the app in your instance

streamlit run streamlit_app.py

3: Domain acquisition This time, I got an appropriate domain name ʻonedata.ml from the free domain service [freenom](https://www.freenom.com/ja/index.html). (You can get ".tk", ".ml", ".ga", ".cf", ".gq" domains for free) 4: Edit streamlit config file In the configuration file (config.toml) in the ~ / .streamlit` folder, Enter the domain name you obtained as the access address.

[browser]
gatherUsageStats = false
serverAddress = "onedata.ml"

[server]
port = 8501

5: Port forwarding from 80 to 8501 By default, streamlit accepts communication on port 8501. I want to be able to access from the default port 80 in order to access without specifying the port by domain name. To do this, you need root privileges when running the app. Therefore, here, we will deal with it by playing with iptables and forwarding the access to port 80 to port 8501.

sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 8501

6: Opening port 80 Edit Security Group> Inbound Rules on the EC2 dashboard and leave port 80 open.

Here is the app I made ↓ http://onedata.ml

Implementation ↓ https://github.com/tkmz-n/streamlit_app

Summary

Create a simple demo app to visualize open data using Streamlit I tried to deploy it on AWS and publish it.

Looking at the data, we can see that the traffic of people in the 23 wards of Tokyo has been gradually decreasing since the end of March. Let's continue to stay at home with the aim of converging the new Corona as soon as possible.

Recommended Posts

Quickly create a Python data analysis dashboard with Streamlit and deploy it to AWS
Create a decision tree from 0 with Python and understand it (3. Data analysis library Pandas edition)
[AWS lambda] Deploy including various libraries with lambda (generate a zip with a password and upload it to s3) @ Python
Create a deploy script with fabric and cuisine and reuse it
I wrote a script to create a Twitter Bot development environment quickly with AWS Lambda + Python 2.7
Create a USB boot Ubuntu with a Python environment for data analysis
[AWS] Create a Python Lambda environment with CodeStar and do Hello World
Deploy a Python app on Google App Engine and integrate it with GitHub
Make a decision tree from 0 with Python and understand it (4. Data structure)
Create a decision tree from 0 with Python and understand it (5. Information Entropy)
Steps to create a Twitter bot with python
Upload data to s3 of aws with a command and update it, and delete the used data (on the way)
When I tried to create a virtual environment with Python, it didn't work
Data analysis with Python
[Python] How to create a local web server environment with SimpleHTTPServer and CGIHTTPServer
Reading Note: An Introduction to Data Analysis with Python
Create a Layer for AWS Lambda Python with Docker
[Python] How to create a 2D histogram with Matplotlib
Deploy a web app created with Streamlit to Heroku
Get a large amount of Starbucks Twitter data with python and try data analysis Part 1
Steps to quickly create a deep learning environment on Mac with TensorFlow and OpenCV
Until you create a machine learning environment with Python on Windows 7 and run it
I tried scraping food recall information with Python to create a pandas data frame
Create a directory with python
Create a Mastodon bot with a function to automatically reply with Python
I wrote a program quickly to study DI with Python ①
Associate Python Enum with a function and make it Callable
Probably the easiest way to create a pdf with Python3
Create applications, register data, and share with a single email
Let's create a PRML diagram with Python, Numpy and matplotlib.
20200329_Introduction to Data Analysis with Python Second Edition Personal Summary
Try to bring up a subwindow with PyQt5 and Python
Create a simple video analysis tool with python wxpython + openCV
Create API with Python, lambda, API Gateway quickly using AWS SAM
Create a message corresponding to localization with python translation string
Get additional data to LDAP with python (Writer and Reader)
Overview of Python virtual environment and how to create it
Recursively get the Excel list in a specific folder with python and write it to Excel.
Steps to create a Job that pulls a Docker image and tests it with Github Actions
I made a server with Python socket and ssl and tried to access it from a browser
Return the image data with Flask of Python and draw it to the canvas element of HTML
[Python / Ruby] Understanding with code How to get data from online and write it to CSV
Steps to set up Pipenv, create a CRUD app with Flask, and containerize it with Docker
[Python] I wrote a test of "Streamlit" that makes it easy to create visualization applications.
[Python] What is a tuple? Explains how to use without tuples and how to use it with examples.
processing to use notMNIST data in Python (and tried to classify it)
Create folders from '01' to '12' with python
Create a temporary file with django as a zip file and return it
Introduction to Data Analysis with Python P32-P43 [ch02 3.US Baby Names 1880-2010]
Create a virtual environment with Python!
Try to create a python environment with Visual Studio Code & WSL
I tried to create a list of prime numbers with python
Introduction to Data Analysis with Python P17-P26 [ch02 1.usa.gov data from bit.ly]
Get data from MySQL on a VPS with Python 3 and SQLAlchemy
How to create a heatmap with an arbitrary domain in Python
Links to people who are just starting data analysis with python
I tried to make a periodical process with Selenium and Python
Find the white Christmas rate by prefecture with Python and map it to a map of Japan
Make a scraping app with Python + Django + AWS and change jobs
I tried to create Bulls and Cows with a shell program
I want to create a pipfile and reflect it in docker