A sample for a quick look at trends in flow data.
From Ministry of Land, Infrastructure, Transport and Tourism Logistics Census [Inter-prefectural flow volume (by product type) -Weight-](http://www. mlit.go.jp/sogoseisaku/transport/sosei_transport_fr_000074.html) is used. Each sheet is weight data of physical distribution by prefecture in 2010. There are 10 sheets in total, agriculture and water, forestry, mining, metal machinery, chemistry, light work, miscellaneous work, discharge, and special.
I have prepared a Docker image (tsutomu7 / graphviz) so that you can use graphviz quickly, so you can execute the following.
bash
firefox http://localhost:8888 &
docker run -it --rm -p 8888:8888 tsutomu7/graphviz
If you want to build it yourself, do "conda install graphviz" and "pip install graphviz" after installing Anaconda. Install the main body of graphgviz with conda and the wrapper with pip.
Read all sheets into variable a at once with read_excel. For a, 0-9 is the key and DataFrame is the value. a [0] is the "total" DataFrame for all industries.
python3
import numpy as np, pandas as pd
cat = 'Total Agriculture, Forestry, Mining, Metal Machinery, Chemistry, Light Work, Miscellaneous Work, Emissions Special'.split()
rng = list(range(len(cat)))
a = pd.read_excel('http://www.mlit.go.jp/sogoseisaku/transport/butsuryucensus/T9-010301.xls',
rng, skip_footer=1, skiprows=8, header=None, index_col=0, parse_cols=np.arange(1,49))
a[0].ix[:3, :7]
1 | 2 | 3 | 4 | 5 | 6 | 7 | |
---|---|---|---|---|---|---|---|
1 | |||||||
North Sea Road th> | 944271.4669 | 6728.8486 | 1075.6893 | 7623.1429 | 2350.6049 | 164.8547 | 4221.1922 |
Aomori th> | 22969.4545 | 257605.0057 | 12702.7039 | 2857.1319 | 8079.5519 | 750.9524 | 1799.4754 |
Iwate th> | 211.2175 | 5090.7300 | 194668.7805 | 10623.9818 | 1518.8552 | 676.9535 | 1244.9179 |
It represents a prefecture whose row is From and a prefecture whose column is To, and is a 47 x 47 matrix.
Sort by industry in descending order of physical distribution.
python3
prefs = a[0].index.map(lambda x: x.replace('\u3000', ''))
b = [pd.DataFrame([(prefs[i], prefs[j], a[h].iloc[i, j]) for i in range(47) for j in range(47)
if i != j], columns=['From', 'To', 'Val']).sort_values('Val', ascending=False) for h in rng]
b[0][:3]
From | To | Val | |
---|---|---|---|
1080 | Mie td> | Aichi td> | 170322.9506 |
1268 | Hyogo td> | Osaka td> | 165543.7879 |
1499 | Okayama td> | Hyogo td> | 142949.9022 |
The figure is output as "fig_industry.png ". The flow rate is 1000 tons / year.
python3
from graphviz import Digraph
from IPython.display import display
for h, c in zip(rng, cat):
g = Digraph(format='png')
g.attr('graph', label=c, labelloc='t')
g.node_attr['fontsize'] = '10'
for _, r in b[h][:5].iterrows():
g.edge(r.From, r.To, label='%d'%(r.Val//1000))
g.render('fig_%s'%c)
display(g)
that's all
Recommended Posts