I got the rent information from suumo in previous article. I would like to map the acquired information on a map so that I can see the correlation between location and rent.
The result of the previous execution is shown. You can get the following output. This figure is a color-coded mapping of the locations of apartments in the 23 wards of Tokyo by price. The lighter the color, the cheaper it is, and the darker it is, the more expensive it is. The colors are divided into 4 levels, each of which is listed in the following range. 25%: ~ 85,900 yen 50%: 8.59 ~ 106,200 yen 75%: 10.62 ~ 136,300 yen 100%: 13.63 to 1.9 million yen The first quartile, the second quartile, the third quartile, and the fourth quartile are calculated, respectively. The colors can be color-coded up to 12 levels as well as 4 levels.
Overall, you can see that the rent is cheaper from the city center to the southwest. Also, the east and north are relatively cheaper than other places. It may be obvious for people living in Tokyo, but it was a surprising discovery because I didn't know anything about Tokyo. By mapping in this way, you can visually understand the location and rent information. If you expand the range of information acquisition, you will be able to see the correlation in a wider range.
You need to convert the address to longitude / latitude for mapping. Since I want to draw the mapping result immediately and see it, I performed coordinate transformation and mapping in different environments.
To be honest, I don't think it's a problem to go all together in the same environment.
In order to perform mapping, it is first necessary to change the acquired address to coordinates (longitude / latitude). Obtaining the coordinates of the place from the address or name in this way is called "geocoding". It seems that geocoding can also be done using APIs such as Google and yahoo, but I stopped because registration seems to be troublesome. I heard that there is an API that can be geocoded for free when I'm looking for various things, so I used that. The API used is geocoding.jp. When I looked it up, there were various implementation examples, so I created the code with reference to them.
The code I created is below.
get_zahyo.py
import requests
from bs4 import BeautifulSoup
import time
import csv
def get_lat_lon_from_address(address):
url = 'http://www.geocoding.jp/api/'
latlons = []
payload = {'q': address}
r = requests.get(url, params=payload)
ret = BeautifulSoup(r.content,'lxml')
if ret.find('error'):
raise ValueError("Invalid address submitted. {address}")
else:
x = ret.find('lat').string
y = ret.find('lng').string
time.sleep(10)
return x, y
input_path = 'input.csv'
output_path = 'output.csv'
f1 = open(input_path)
reader = csv.reader(f1)
f2 = open(output_path, 'a')
writer = csv.writer(f2)
zahyo_d = {}
for row in reader:
address = row[1]
if address in zahyo_d:
print('skip')
x = zahyo_d[address][0]
y = zahyo_d[address][1]
else:
print('get zahyo...')
x, y = get_lat_lon_from_address(address)
zahyo_d[address] = [x, y]
row.append(x)
row.append(y)
writer.writerow(row)
Enter the csv file name obtained in Previous article in input_path, and enter the file name you like in output_path.
In order to reduce the number of accesses, the address and coordinates once obtained are saved in a dictionary, and when the same address as the previously searched address comes, it is pulled from the saved dictionary. Even so, the number of accesses will increase, so it is necessary to open the access time interval with time.sleep (10) so as not to burden the other server.
Below are examples of input.csv and output.csv.
input.csv(part)
La Tour Kagurazaka,Nishigokencho, Shinjuku-ku, Tokyo,18 years old,2 basement and 24 stories above ground,430,000 yen,2LDK,79.7m2,16th floor
La Tour Kagurazaka,Nishigokencho, Shinjuku-ku, Tokyo,18 years old,2 basement and 24 stories above ground,57.80,000 yen,3LDK,103.4m2,16th floor
Pearl Hakusan,4 Hakusan, Bunkyo-ku, Tokyo,36 years old,8 stories,8.30,000 yen,3K,42m2,8th Floor
River City 21 East Towers II,Tsukuda 2, Chuo-ku, Tokyo,20 years old,2 basement and 43 stories above ground,13.90,000 yen,1LDK,44.2m2,9th floor
output.csv(part)
La Tour Kagurazaka,Nishigokencho, Shinjuku-ku, Tokyo,18 years old,2 basement and 24 stories above ground,430,000 yen,2LDK,79.7m2,16th floor,35.706903,139.737421
La Tour Kagurazaka,Nishigokencho, Shinjuku-ku, Tokyo,18 years old,2 basement and 24 stories above ground,57.80,000 yen,3LDK,103.4m2,16th floor,35.706903,139.737421
Pearl Hakusan,4 Hakusan, Bunkyo-ku, Tokyo,36 years old,8 stories,8.30,000 yen,3K,42m2,8th Floor,35.721231,139.746682
River City 21 East Towers II,Tsukuda 2, Chuo-ku, Tokyo,20 years old,2 basement and 43 stories above ground,13.90,000 yen,1LDK,44.2m2,9th floor,35.668253,139.786297
It is ok if the longitude and latitude are added.
Even if you search the coordinates from the address, there are some addresses where the coordinates cannot be obtained and an error occurs. Since I wrote down the address where the error occurred, it is recommended to omit the line with these addresses. It may be a good idea to save a dictionary of addresses and coordinates in case of an error, which does not burden the server on the other side and shortens the execution time.
Address where the error occurred
Inner city
1 Kanda Sudacho, Chiyoda-ku, Tokyo
2 Iwamotocho, Chiyoda-ku, Tokyo
1 Kanda Ogawamachi, Chiyoda-ku, Tokyo
3 Kanda Surugadai, Chiyoda-ku, Tokyo
East 23 wards
3 Kameari, Katsushika-ku, Tokyo
Ohanajaya 2 in Katsushika-ku, Tokyo
West 23 wards
3 Numabukuro, Nakano-ku, Tokyo
3 Nogata, Nakano-ku, Tokyo
Since we were able to convert the address to coordinates, we will map it on the map based on the coordinate information. I used folium as a mapping library.
Before explaining the implementation code, I will explain the directory structure when the coordinate conversion is completed. In order to divide by range, I divide the files for each part of Tokyo's 23 wards as follows. The "#data acquisition" part in the implementation code is implemented on the assumption that this directory structure is used. Please change the "#data acquisition" part to suit your file structure.
%ls
mapping.ipynb output_center.csv output_east.csv output_north.csv output_south.csv output_west.csv
The code I created is below.
mapping.py
import folium
import pandas as pd
#Mapping stations on the Yamate line
def mapping_stations(_map):
locations_station = [[35.681382, 139.76608399999998],
[35.675069, 139.763328],
[35.665498, 139.75964],
[35.655646, 139.756749],
[35.645736, 139.74757499999998],
[35.630152, 139.74044000000004],
[35.6197, 139.72855300000003],
[35.626446, 139.72344399999997],
[35.633998, 139.715828],
[35.64669, 139.710106],
[35.658517, 139.70133399999997],
[35.670168, 139.70268699999997],
[35.683061, 139.702042],
[35.690921, 139.70025799999996],
[35.701306, 139.70004399999993],
[35.712285, 139.70378200000005],
[35.721204, 139.706587],
[35.728926, 139.71038],
[35.731401, 139.72866199999999],
[35.733492, 139.73934499999996],
[35.736489, 139.74687500000005],
[35.738062, 139.76085999999998],
[35.732135, 139.76678700000002],
[35.727772, 139.770987],
[35.720495, 139.77883700000007],
[35.713768, 139.77725399999997],
[35.707438, 139.774632],
[35.698683, 139.77421900000002],
[35.69169, 139.77088300000003]]
for l in locations_station:
folium.Circle(radius=10, location=l, color='blue').add_to(_map)
return _map
#Get data
names = ['center', 'east', 'south', 'west', 'north']
df_list = []
for n in names:
path = 'output_{}.csv'.format(n)
df_list.append(pd.read_csv(path, names=['name', 'address', 'age', 'height', 'rent', 'kinds', 'area', 'floor', 'x', 'y']))
df = pd.concat(df_list)
#Convert rent to numbers
df['rent'] = df['rent'].str.strip('Ten thousand yen').astype(float)
#Processing of the same address
address = df['address'].unique()
new_df = []
for adr in address:
df_adr = df.loc[df['address']==adr]
value = df_adr['rent'].mean()
new_df.append([value, df_adr.iloc[0, 8], df_adr.iloc[0, 9]])
df = pd.DataFrame(new_df, columns=['rent', 'x', 'y'])
#color decision
#colors = ['#fff4f4', '#ffeaea', '#ffd5d5', '#ffaaaa', '#ff8080', '#ff5555', '#ff2b2b', '#ff0000', '#d50000', '#aa0000', '#800000', '#550000']
#colors = ['#fff4f4', '#ffd5d5', '#ff8080', '#ff2b2b', '#d50000', '#800000']
colors = ['#ffd5d5', '#ff5555', '#d50000', '#550000']
num_color = len(colors)
df.loc[df['rent']<df['rent'].quantile(1/num_color), 'color'] = colors[0]
for i in range(1, num_color-1):
df.loc[(df['rent'].quantile(i/num_color) <= df['rent']) & (df['rent'] < df['rent'].quantile((i+1)/num_color)), 'color'] = colors[i]
df.loc[df['rent']>=df['rent'].quantile((num_color-1)/num_color), 'color'] = colors[-1]
#mapping
location = [df['x'].mean(), df['y'].mean()]
_map = folium.Map(location=location, zoom_start=12, tiles="Stamen Toner")
for i in range(len(df)):
folium.Circle(radius=150, location=[df.loc[i, 'x'], df.loc[i, 'y']], color=df.loc[i, 'color'], fill_color=df.loc[i, 'color'], fill=True,).add_to(_map)
#_map = mapping_stations(_map)
#Print value range
print('{}% : - {:.2f}'.format(int((1)/num_color*100), df['rent'].quantile((1)/num_color)))
for i in range(1, num_color):
print('{}% : {:.2f} - {:.2f}'.format(int((i+1)/num_color*100), df['rent'].quantile((i)/num_color), df['rent'].quantile((i+1)/num_color)))
#For anaconda
_map
#For ordinary python
#_map.save('map.html')
As described in the "Directory structure" part, the "#Data acquisition" part is based on the above directory structure, so please change it. If there is only one output file
#Data acquisition
path = [path to file]
df = pd.read_csv(path, names=['name', 'address', 'age', 'height', 'rent', 'kinds', 'area', 'floor', 'x', 'y'])
I think it's okay.
You can change the number of colors by selecting the one you like.
_map = mapping_stations (_map) is a function that maps station information on the Yamanote Line in Tokyo. You can output by deleting the comment out. (Output example below)
In the case of anaconda, it can be displayed inline, so it can be output with _map. If you normally execute it with python, it cannot be displayed inline, so you can display it by saving once with _map.save ('map.html').
I tried to map the apartment information on the map. Mapping is very useful because it allows you to visually capture information. You can see more overall correlation by scraping more and expanding the mapping range. In addition, various information such as room size and hierarchy is obtained by scraping, so I would like to try data analysis with pandas as well.
Recommended Posts