what is this

A sample for a quick look at trends in flow data.

raw data

From Ministry of Land, Infrastructure, Transport and Tourism Logistics Census [Inter-prefectural flow volume (by product type) -Weight-](http://www. mlit.go.jp/sogoseisaku/transport/sosei_transport_fr_000074.html) is used. Each sheet is weight data of physical distribution by prefecture in 2010. There are 10 sheets in total, agriculture and water, forestry, mining, metal machinery, chemistry, light work, miscellaneous work, discharge, and special.

Execution environment

I have prepared a Docker image (tsutomu7 / graphviz) so that you can use graphviz quickly, so you can execute the following.

`bash`


firefox http://localhost:8888 &
docker run -it --rm -p 8888:8888 tsutomu7/graphviz

If you want to build it yourself, do "conda install graphviz" and "pip install graphviz" after installing Anaconda. Install the main body of graphgviz with conda and the wrapper with pip.

Try it with Python

Data reading

Read all sheets into variable a at once with read_excel. For a, 0-9 is the key and DataFrame is the value. a [0] is the "total" DataFrame for all industries.

`python3`


import numpy as np, pandas as pd
cat = 'Total Agriculture, Forestry, Mining, Metal Machinery, Chemistry, Light Work, Miscellaneous Work, Emissions Special'.split()
rng = list(range(len(cat)))
a = pd.read_excel('http://www.mlit.go.jp/sogoseisaku/transport/butsuryucensus/T9-010301.xls',
    rng, skip_footer=1, skiprows=8, header=None, index_col=0, parse_cols=np.arange(1,49))
a[0].ix[:3, :7]

	1	2	3	4	5	6	7
1
North Sea Road	944271.4669	6728.8486	1075.6893	7623.1429	2350.6049	164.8547	4221.1922
Aomori	22969.4545	257605.0057	12702.7039	2857.1319	8079.5519	750.9524	1799.4754
Iwate	211.2175	5090.7300	194668.7805	10623.9818	1518.8552	676.9535	1244.9179

It represents a prefecture whose row is From and a prefecture whose column is To, and is a 47 x 47 matrix.

Normalize

Sort by industry in descending order of physical distribution.

`python3`


prefs = a[0].index.map(lambda x: x.replace('\u3000', ''))
b = [pd.DataFrame([(prefs[i], prefs[j], a[h].iloc[i, j]) for i in range(47) for j in range(47)
    if i != j], columns=['From', 'To', 'Val']).sort_values('Val', ascending=False) for h in rng]
b[0][:3]

	From	To	Val
1080	Mie	Aichi	170322.9506
1268	Hyogo	Osaka	165543.7879
1499	Okayama	Hyogo	142949.9022

Draw a diagram with the top 5 flow rates in each industry

The figure is output as "fig_industry.png ". The flow rate is 1000 tons / year.

`python3`


from graphviz import Digraph
from IPython.display import display
for h, c in zip(rng, cat):
    g = Digraph(format='png')
    g.attr('graph', label=c, labelloc='t')
    g.node_attr['fontsize'] = '10'
    for _, r in b[h][:5].iterrows():
        g.edge(r.From, r.To, label='%d'%(r.Val//1000))
    g.render('fig_%s'%c)
    display(g)

fig_合計.png fig_農水.png fig_林業.png fig_鉱産.png fig_金属機械.png fig_化学.png fig_軽工.png fig_雑工.png fig_排出.png fig_特殊.png