We will deal with these two here. It runs on Google Colaboratory.
!pip install japanmap
from japanmap import pref_names,pref_code,picture
print(pref_names)
['_','Hokkaido','Aomori Prefecture','Iwate Prefecture','Miyagi Prefecture','Akita Prefecture','Yamaguchi Prefecture','Fukui Prefecture','Ibaraki Prefecture','Tochigi Prefecture',' Gunma prefecture',' Saitama prefecture',' Chiba prefecture',' Tokyo Metropolitan area',' Kanagawa prefecture',' Niigata prefecture',' Tottori prefecture',' Ishikawa prefecture',' Fukui prefecture',' Yamanashi prefecture',' Nagano prefecture','Gifu prefecture','Shizuoka prefecture','Aichi prefecture','Mie prefecture','Shiga prefecture','Kyoto prefecture','Osaka prefecture','Hyogo prefecture','Nara prefecture',' Wakayama prefecture','Tottori prefecture','Shimane prefecture','Okayama prefecture','Hiroshima prefecture','Yamaguchi prefecture','Tokushima prefecture','Kagawa prefecture','Ehime prefecture','Kochi prefecture',' Fukuoka prefecture',' Saga prefecture',' Nagasaki prefecture',' Kumamoto prefecture',' Oita prefecture',' Miyazaki prefecture',' Kagoshima prefecture',' Okinawa prefecture']
pref_code('Tokyo')
13
%matplotlib inline
import matplotlib.pyplot as plt
from pylab import rcParams
rcParams['figure.figsize'] = 6, 6
plt.imshow(picture())
plt.imshow(picture({'Tottori':'red','Saga':(0,255,0)}))
As the subject, I used the Excel file of the statistical data by prefecture published on the Teikoku Shoin website.
urllib is a library for download, but if you use it on Google Colab, you will upload it on Google Colab. In Google Colab, the uploaded file disappears at the end of the session, so upload it again at that time.
#Download by urllib
import urllib.request
url = "URL where the excel file is located"
urllib.request.urlretrieve(url, 'Teikoku-Shoin-Japan.xlsx')
As far as I know, there are two ways to open an Excel file in Python.
import pandas as pd
excel = pd.read_excel('Teikoku-Shoin-Japan.xlsx') #This method may not work
import pandas as pd
excel = pd.ExcelFile('Teikoku-Shoin-Japan.xlsx') #I feel that this one works better
Excel, unlike csv etc., consists of multiple sheets.
for s in enumerate(excel.sheet_names):
print(s) #List sheet names and numbers
(0,'Area \ u3000 [2017]') (1,'Population \ u3000 [2018]') (2,'Population density \ u3000 [2018]') (3,'Population growth rate \ u3000 [2010-2015]') (4,'Old age (65 years old and over) population rate \ u3000 [2018]') (5,'Population ratio by industry (2015)') (6,'Average life expectancy (female) \ u3000 [2015]') (7,'Average life expectancy (male) \ u3000 [2015]') (8,'Cultivated land area [2017]') (9,'Cultivated land ratio \ u3000 [2017]') (10,'Forest rate / planted forest rate [2017]') (11,'Wheat production \ u3000 [2018]') (12,'Rice production \ u3000 [2017]') (13,'Sweet potato production \ u3000 [2017]') (14,'Potato production \ u3000 [2017]') (15,'Soybean production \ u3000 [2017]') (16,'Peanut production \ u3000 [2017]') (17,'Cabbage production \ u3000 [2017]') (18,'Cucumber production \ u3000 [2017]') (19,'Watermelon production \ u3000 [2017]') (20,'Daikon production \ u3000 [2017]') (21,'Production of rapeseed [2017]') (22,'Carrot production \ u3000 [2017]') (23,'Leek production \ u3000 [2017]') (24,'Chinese cabbage production \ u3000 [2017]') (25,'Pepper production \ u3000 [2017]') (26,'Spinach production \ u3000 [2017]') (27,'Production of tea (rough tea) \ u3000 [2017]') (28,'Strawberry production \ u3000 [2017]') (29,'Plum production \ u3000 [2017]') (30,'Production of melon \ u3000 [2017]') (31,'Persimmon production \ u3000 [2017]') (32,'Cherry production \ u3000 [2017]') (33,'Mandarin orange production \ u3000 [2017]') (34,'Momo production \ u3000 [2017]') (35,'Apple production \ u3000 [2017]') (36,'Chrysanthemum shipments [2017]') (37,'Pansy shipments [2017]') (38,'Shipping volume of orchids (cut flowers) \ u3000 [2017]') (39,'Number of broiler birds \ u3000 [2018]') (40,'Number of pigs raised \ u3000 [2018]') (41,'Number of beef cattle raised \ u3000 [2018]') (42,'Number of dairy cows \ u3000 [2018]') (43,'Production of eyebrows [2017]') (44,'Agricultural output \ u3000 [2017]') (45,'Rice output \ u3000 [2017]') (46,'Wheat output \ u3000 [2017]') (47,'Bean output \ u3000 [2017]') (48,'Potato output \ u3000 [2017]') (49,'Vegetable output \ u3000 [2017]') (50,'Fruit output \ u3000 [2017]') (51,'Flower output \ u3000 [2017]') (52,'Craft crop output \ u3000 [2017]') (53,'Livestock output \ u3000 [2017]') (54,'Fishery production (fishery / aquaculture) [2016]') (55,'Squid catch \ u3000 [2016]') (56,'Katsuo catch \ u3000 [2016]') (57,'Pacific saury catch \ u3000 [2016]') (58,'Sardine catch \ u3000 [2016]') (59,'Tuna catch \ u3000 [2016]') (60,'Oyster farming \ u3000 [2016]') (61,'Scallop farming \ u3000 [2016]') (62,'Wakame seaweed farming \ u3000 [2016]') (63,'Production of material (raw wood) \ u3000 [2016]') (64,'Industrial shipment value \ u3000 [2016]') (65,'Production of fruit wine [2016]') (66,'Production of Yakichu [2016]') (67,'Production of sake (including muddy sake) \ u3000 [2016]') (68,'Beer production \ u3000 [2016]') (69,'Production of petroleum and coal products [2016]') (70,'Semiconductor production \ u3000 [2016]') (71,'Production of industrial robots \ u3000 [2016]') (72,'Production of automobiles (including motorcycles) \ u3000 [2016]') (73,'Fiber production \ u3000 [2016]') (74,'Steel production \ u3000 [2016]') (75,'Production of integrated circuits \ u3000 [2016]') (76,'Prefectural income per capita \ u3000 [2015]') (77,'Urban park area per person \ u3000 [end of March 2017]') (78,'Electricity demand per person \ u3000 [2015]') (79,'Garbage emissions per person per day \ u3000 [2016]') (80,'Sewerage penetration rate \ u3000 [2018]') (81,'Number of passenger cars per household \ u3000 [2018]') (82,'Retail annual sales \ u3000 2015') (83,'Number of pollution complaints \ u3000 [2017]')
I will select a suitable sheet
sheet_index = 2
print(excel.sheet_names[sheet_index])
sheet = excel.parse(excel.sheet_names[sheet_index], header=None) #Make the contents of the second sheet into pandas format
Population density [2018]
sheet.head() #Check the shape of the data
0 | 1 | 2 | |
---|---|---|---|
0 | Source: 2018 Basic Resident Register Population / Household Number Table, 2017 Area by Prefecture, City, Town, and Village td> | NaN | NaN |
1 | Prefecture code td> | Prefecture name td> | (people / km2) td> |
2 | 1 | Hokkaido td> | 64 |
3 | 2 | Aomori td> | 136 |
4 | 3 | Iwate td> | 83 |
sheet.iloc[2:49, [0, 2]] #Extract the necessary parts
0 | 2 | |
---|---|---|
2 | 1 | 64 |
3 | 2 | 136 |
4 | 3 | 83 |
5 | 4 | 317 |
... | ... | ... |
46 | 45 | 144 |
47 | 46 | 180 |
48 | 47 | 645 |
sheet.iloc[2:49, [0, 2]].values #Convert pandas format to numpy format
array([[1, 64],
[2, 136],
[3, 83],
[4, 317],
[5, 87],
...,
[46, 180],
[47, 645]], dtype=object)
values = sheet.iloc[2:49, [0, 2]].values
The code below
Outputs the color code of.
import numpy as np
def color_scale(value, max_value):
try:
v = value / max_value
if v >= 1.0:
return (255, 0, 0)
elif v <= 0:
return (0, 0, 255)
elif v > 0.5:
red = 1
green = int(255 * (2 - 2 * v))
blue = 0
return (red, green, blue)
else:
red = int(255 * 2 * v)
green = int(255 * 2 * v)
blue = int(255 * (1 - 2 * v))
return (red, green, blue)
except:
return (255, 255, 255)
The usage example looks like this.
for v in [0.0, 0.2, 0.5, 0.7, 1.0, "string"]:
print(v, color_scale(v, 1))
0.0 (0, 0, 255)
0.2 (102, 102, 153)
0.5 (255, 255, 0)
0.7 (1, 153, 0)
1.0 (255, 0, 0)
string (255, 255, 255)
Find the maximum value
max_value = np.max([a for a in values[:, 1] if not isinstance(a, str)])
Decide the color for each prefecture
data ={}
for code, value in zip(values[:, 0], values[:, 1]):
data[code] = color_scale(value, max_value)
Color the map of Japan
%matplotlib inline
import matplotlib.pyplot as plt
from pylab import rcParams
print(excel.sheet_names[sheet_index])
plt.imshow(picture(data))
Population density [2018]
It looks like this when painted with different data. Where there is no data, it is completely white.
sheet_index = 60
print(excel.sheet_names[sheet_index])
sheet = excel.parse(excel.sheet_names[sheet_index], header=None)
values = sheet.iloc[2:49, [0, 2]].values
max_value = np.max([a for a in values[:, 1] if not isinstance(a, str)])
data ={}
for code, value in zip(values[:, 0], values[:, 1]):
data[code] = color_scale(value, max_value)
plt.imshow(picture(data))
Oyster farming [2016]
Recommended Posts