This is the 21st day article of MYJLab Advent Calendar 2019. Thatchy will be in charge. (Sorry for being late)
Recently, there is a phenomenon in my hometown where there are FamilyMarts across the road. In order to visualize this phenomenon, I would like to reduce the increase in the number of FamilyMart stores to a time-series heat map for each prefecture.
There is a library called Leaflet that can draw a nice map with JavaScript. This time, I will create a time series heatmap using a library called folium that allows you to use Leaflet in Python.
Since the function and usage of folium changes depending on the version, be sure to set the version to 0.10.1 this time.
#Library load
import pandas as pd
import folium
from folium import plugins
We have prepared the data by re-converting the number of FamilyMart stores in each prefecture from 1999 to 2019 on the here site into a csv file. .. It can be dropped with Python Requests. In addition, the latitude and longitude of each prefecture are required when drawing on the map. This time, the location where the prefectural office is located is the latitude and longitude of each prefecture. The data can be downloaded from here.
#Data on the number of stores
import requests
import io
URL = "https://drive.google.com/uc?id=1-8tppvHwwVJWufYVskTfGz7cCrBIE0SM"
r = requests.get(URL)
famima_data = pd.read_csv(io.BytesIO(r.content))
famima_data.head()
Due to the increase in recent years, the number of missing values has increased considerably from 1999 to 2006.
#Latitude and longitude of the prefectural office
geo_data = pd.read_csv("./data/prefecturalCapital.csv")
geo_data.head()
Then combine the two data frames. I want to combine using id as a key, so I change the id of geo_data to 0 and combine. The missing value is set to 0 once.
import numpy as np
geo_data.id = geo_data.id - 1
merged_data = pd.merge(famima_data, geo_data[["id", "lat", "lon"]], on=["id"])
merged_data = merged_data.replace(np.nan, 0)
merged_data.head()
The basic data is ready.
This time, I want to visualize the change in the number of stores, so I will take the difference in the column.
#Get an array of time series column names
time_columns = merged_data.columns[2:23].values
#Only the data part of the number of stores is diffed_Let it be data
merged_data.loc[:, time_columns] = merged_data.loc[:, time_columns].astype(float)
diff_data = merged_data.copy()
diff_data.loc[:, time_columns] = merged_data.loc[:, time_columns].diff(axis=1)
#Since the data for 1999 will be lost, delete it.
diff_data = diff_data.dropna(axis=1)
time_columns = time_columns[1:]
diff_data.head()
When the difference is obtained, perform min-max-scaling. In the folium heatmap, 0 is inconvenient, so add 1e-4 to the whole.
# diff_Scale data and scaled_Let it be data
scaled_data = diff_data.copy()
scaled_data.loc[:, time_columns] = (diff_data.loc[:, time_columns] - diff_data.loc[:, time_columns] .min().min()) / (diff_data.loc[:, time_columns] .max().max() - diff_data.loc[:, time_columns] .min().min())
scaled_data.loc[:, time_columns] = scaled_data.loc[:, time_columns] + 1e-4
scaled_data.head()
Finally, to draw a time series heatmap Create 3D data that will be [[[latitude, longitude, data] * 47 prefectures] * 1999 ~ 2019].
heat_map_data = [[[row['lat'],row['lon'], row[idx]] for index, row in scaled_data.iterrows()] for idx in time_columns]
#Since the shape of the data is difficult to understand, only the first one is output
heat_map_data[0]
#output
[[43.064359, 141.347449, 0.051760516605166056],
[40.824294, 140.74005400000001, 0.051760516605166056],
[39.70353, 141.15266699999998, 0.05545055350553506],
[38.268737, 140.872183, 0.060985608856088565],
[39.718175, 140.10335600000002, 0.051760516605166056],
[38.240127, 140.362533, 0.07390073800738008],
[37.750146, 140.466754, 0.0923509225092251],
[36.341817, 140.446796, 0.04807047970479705],
[36.56575, 139.883526, 0.05545055350553506],
[36.391205, 139.060917, 0.060985608856088565],
[35.857771, 139.647804, 0.060985608856088565],
[35.604563, 140.123179, 0.04807047970479705],
[35.689184999999995, 139.691648, 0.0997309963099631],
[35.447505, 139.642347, 0.06467564575645757],
[37.901699, 139.022728, 0.051760516605166056],
[36.695274, 137.211302, 0.06467564575645757],
[36.594729, 136.62555, 0.06467564575645757],
[36.065220000000004, 136.221641, 0.06283062730627306],
[35.665102000000005, 138.568985, 0.05545055350553506],
[36.651282, 138.180972, 0.051760516605166056],
[35.39116, 136.722204, 0.05729557195571956],
[34.976987, 138.383057, 0.05729557195571956],
[35.180246999999994, 136.906698, 0.07574575645756458],
[34.730546999999994, 136.50861, 0.06836568265682658],
[35.004532, 135.868588, 0.05360553505535055],
[35.020996200000006, 135.7531135, 0.05360553505535055],
[34.686492, 135.518992, 0.0978859778597786],
[34.69128, 135.183087, 0.08128081180811808],
[34.685296, 135.832745, 0.04622546125461255],
[34.224806, 135.16795, 0.08866088560885609],
[35.503463, 134.238258, 0.051760516605166056],
[35.472248, 133.05083, 0.051760516605166056],
[34.66132, 133.934414, 0.060985608856088565],
[34.396033, 132.459595, 0.06836568265682658],
[34.185648, 131.470755, 0.051760516605166056],
[34.065732000000004, 134.559293, 0.051760516605166056],
[34.340140000000005, 134.04297, 0.051760516605166056],
[33.841649, 132.76585, 0.051760516605166056],
[33.55969, 133.530887, 0.051760516605166056],
[33.606767, 130.418228, 0.060985608856088565],
[33.249367, 130.298822, 0.05360553505535055],
[32.744541999999996, 129.873037, 0.10526605166051661],
[32.790385, 130.742345, 0.06652066420664207],
[33.2382, 131.612674, 0.05914059040590406],
[31.91109, 131.423855, 0.05729557195571956],
[31.560219, 130.557906, 0.0868158671586716],
[26.211538, 127.68111499999999, 0.07759077490774909]]
japan_map = folium.Map(location=[35, 135], zoom_start=6)
hm = plugins.HeatMapWithTime(heat_map_data, index=list(time_columns),auto_play=False,radius=30,max_opacity=1,gradient={0.1: 'blue', 0.25: 'lime', 0.5:'yellow',0.75: 'orange', 0.9:'red'})
hm.add_to(japan_map)
japan_map
Only increments are visible as less than about 0.052 should be negative.
I'm happy with the moving map ...! !! !!
I found it very convenient to be able to visualize the transition of data for each time series without writing so much code. I want to use it for something more meaningful. Until the end Thank you for reading. I would appreciate it if you could comment on any corrections.