I got the rent information from suumo in previous article. I would like to map the acquired information on a map so that I can see the correlation between location and rent.

Execution result

The result of the previous execution is shown. You can get the following output. スクリーンショット 2020-07-10 18.41.00.png This figure is a color-coded mapping of the locations of apartments in the 23 wards of Tokyo by price. The lighter the color, the cheaper it is, and the darker it is, the more expensive it is. The colors are divided into 4 levels, each of which is listed in the following range. 25%: ~ 85,900 yen 50%: 8.59 ~ 106,200 yen 75%: 10.62 ~ 136,300 yen 100%: 13.63 to 1.9 million yen The first quartile, the second quartile, the third quartile, and the fourth quartile are calculated, respectively. The colors can be color-coded up to 12 levels as well as 4 levels.

Overall, you can see that the rent is cheaper from the city center to the southwest. Also, the east and north are relatively cheaper than other places. It may be obvious for people living in Tokyo, but it was a surprising discovery because I didn't know anything about Tokyo. By mapping in this way, you can visually understand the location and rent information. If you expand the range of information acquisition, you will be able to see the correlation in a wider range.

Execution environment

You need to convert the address to longitude / latitude for mapping. Since I want to draw the mapping result immediately and see it, I performed coordinate transformation and mapping in different environments.

Coordinate transformation

python3(3.7.0)
requests（2.24.0）
beautifulsoup4（4.9.1）
lxml(4.5.1)

mapping

python3(3.7.4)(anaconda3-2019.10)
folium(0.11.0)
pandas(0.25.1)

To be honest, I don't think it's a problem to go all together in the same environment.

Coordinate transformation

In order to perform mapping, it is first necessary to change the acquired address to coordinates (longitude / latitude). Obtaining the coordinates of the place from the address or name in this way is called "geocoding". It seems that geocoding can also be done using APIs such as Google and yahoo, but I stopped because registration seems to be troublesome. I heard that there is an API that can be geocoded for free when I'm looking for various things, so I used that. The API used is geocoding.jp. When I looked it up, there were various implementation examples, so I created the code with reference to them.

Implementation code

The code I created is below.

`get_zahyo.py`


import requests
from bs4 import BeautifulSoup
import time
import csv

def get_lat_lon_from_address(address):
    url = 'http://www.geocoding.jp/api/'
    latlons = []
    payload = {'q': address}
    r = requests.get(url, params=payload)
    ret = BeautifulSoup(r.content,'lxml')
    if ret.find('error'):
        raise ValueError("Invalid address submitted. {address}")
    else:
        x = ret.find('lat').string
        y = ret.find('lng').string
        time.sleep(10)
    return x, y

input_path = 'input.csv'
output_path = 'output.csv'

f1 = open(input_path)
reader = csv.reader(f1)

f2 = open(output_path, 'a')
writer = csv.writer(f2)

zahyo_d = {}
for row in reader:
    address = row[1]
    if address in zahyo_d:
        print('skip')
        x = zahyo_d[address][0]
        y = zahyo_d[address][1]
    else:
        print('get zahyo...')
        x, y = get_lat_lon_from_address(address)
        zahyo_d[address] = [x, y]
    row.append(x)
    row.append(y)
    writer.writerow(row)

Enter the csv file name obtained in Previous article in input_path, and enter the file name you like in output_path.

In order to reduce the number of accesses, the address and coordinates once obtained are saved in a dictionary, and when the same address as the previously searched address comes, it is pulled from the saved dictionary. Even so, the number of accesses will increase, so it is necessary to open the access time interval with time.sleep (10) so as not to burden the other server.

Below are examples of input.csv and output.csv.

`input.csv(part)`


La Tour Kagurazaka,Nishigokencho, Shinjuku-ku, Tokyo,18 years old,2 basement and 24 stories above ground,430,000 yen,2LDK,79.7m2,16th floor
La Tour Kagurazaka,Nishigokencho, Shinjuku-ku, Tokyo,18 years old,2 basement and 24 stories above ground,57.80,000 yen,3LDK,103.4m2,16th floor
Pearl Hakusan,4 Hakusan, Bunkyo-ku, Tokyo,36 years old,8 stories,8.30,000 yen,3K,42m2,8th Floor
River City 21 East Towers II,Tsukuda 2, Chuo-ku, Tokyo,20 years old,2 basement and 43 stories above ground,13.90,000 yen,1LDK,44.2m2,9th floor

`output.csv(part)`


La Tour Kagurazaka,Nishigokencho, Shinjuku-ku, Tokyo,18 years old,2 basement and 24 stories above ground,430,000 yen,2LDK,79.7m2,16th floor,35.706903,139.737421
La Tour Kagurazaka,Nishigokencho, Shinjuku-ku, Tokyo,18 years old,2 basement and 24 stories above ground,57.80,000 yen,3LDK,103.4m2,16th floor,35.706903,139.737421
Pearl Hakusan,4 Hakusan, Bunkyo-ku, Tokyo,36 years old,8 stories,8.30,000 yen,3K,42m2,8th Floor,35.721231,139.746682
River City 21 East Towers II,Tsukuda 2, Chuo-ku, Tokyo,20 years old,2 basement and 43 stories above ground,13.90,000 yen,1LDK,44.2m2,9th floor,35.668253,139.786297

It is ok if the longitude and latitude are added.

Caution

Even if you search the coordinates from the address, there are some addresses where the coordinates cannot be obtained and an error occurs. Since I wrote down the address where the error occurred, it is recommended to omit the line with these addresses. It may be a good idea to save a dictionary of addresses and coordinates in case of an error, which does not burden the server on the other side and shortens the execution time.

Address where the error occurred

Inner city
1 Kanda Sudacho, Chiyoda-ku, Tokyo
2 Iwamotocho, Chiyoda-ku, Tokyo
1 Kanda Ogawamachi, Chiyoda-ku, Tokyo
3 Kanda Surugadai, Chiyoda-ku, Tokyo

East 23 wards
3 Kameari, Katsushika-ku, Tokyo
Ohanajaya 2 in Katsushika-ku, Tokyo

West 23 wards
3 Numabukuro, Nakano-ku, Tokyo
3 Nogata, Nakano-ku, Tokyo

mapping

Since we were able to convert the address to coordinates, we will map it on the map based on the coordinate information. I used folium as a mapping library.

Directory structure

Before explaining the implementation code, I will explain the directory structure when the coordinate conversion is completed. In order to divide by range, I divide the files for each part of Tokyo's 23 wards as follows. The "#data acquisition" part in the implementation code is implemented on the assumption that this directory structure is used. Please change the "#data acquisition" part to suit your file structure.

%ls
mapping.ipynb  output_center.csv  output_east.csv  output_north.csv  output_south.csv  output_west.csv

Implementation code

The code I created is below.

`mapping.py`


import folium
import pandas as pd

#Mapping stations on the Yamate line
def mapping_stations(_map):
    locations_station = [[35.681382, 139.76608399999998],
[35.675069, 139.763328],
[35.665498, 139.75964],
[35.655646, 139.756749],
[35.645736, 139.74757499999998],
[35.630152, 139.74044000000004],
[35.6197, 139.72855300000003],
[35.626446, 139.72344399999997],
[35.633998, 139.715828],
[35.64669, 139.710106],
[35.658517, 139.70133399999997],
[35.670168, 139.70268699999997],
[35.683061, 139.702042],
[35.690921, 139.70025799999996],
[35.701306, 139.70004399999993],
[35.712285, 139.70378200000005],
[35.721204, 139.706587],
[35.728926, 139.71038],
[35.731401, 139.72866199999999],
[35.733492, 139.73934499999996],
[35.736489, 139.74687500000005],
[35.738062, 139.76085999999998],
[35.732135, 139.76678700000002],
[35.727772, 139.770987],
[35.720495, 139.77883700000007],
[35.713768, 139.77725399999997],
[35.707438, 139.774632],
[35.698683, 139.77421900000002],
[35.69169, 139.77088300000003]]
    for l in locations_station:
        folium.Circle(radius=10, location=l, color='blue').add_to(_map)
    return _map

#Get data
names = ['center', 'east', 'south', 'west', 'north']
df_list = []
for n in names:
    path = 'output_{}.csv'.format(n)
    df_list.append(pd.read_csv(path, names=['name', 'address', 'age', 'height', 'rent', 'kinds', 'area', 'floor', 'x', 'y']))
df = pd.concat(df_list)

#Convert rent to numbers
df['rent'] = df['rent'].str.strip('Ten thousand yen').astype(float)

#Processing of the same address
address = df['address'].unique()
new_df = []
for adr in address:
    df_adr = df.loc[df['address']==adr]
    value = df_adr['rent'].mean()
    new_df.append([value, df_adr.iloc[0, 8], df_adr.iloc[0, 9]])
df = pd.DataFrame(new_df, columns=['rent', 'x', 'y'])

#color decision
#colors = ['#fff4f4', '#ffeaea', '#ffd5d5', '#ffaaaa', '#ff8080', '#ff5555', '#ff2b2b', '#ff0000', '#d50000', '#aa0000', '#800000', '#550000']
#colors = ['#fff4f4', '#ffd5d5', '#ff8080', '#ff2b2b', '#d50000', '#800000']
colors = ['#ffd5d5', '#ff5555', '#d50000', '#550000']
num_color = len(colors)
df.loc[df['rent']<df['rent'].quantile(1/num_color), 'color'] = colors[0]
for i in range(1, num_color-1):
    df.loc[(df['rent'].quantile(i/num_color) <= df['rent']) & (df['rent'] < df['rent'].quantile((i+1)/num_color)), 'color'] = colors[i]
df.loc[df['rent']>=df['rent'].quantile((num_color-1)/num_color), 'color'] = colors[-1]

#mapping
location = [df['x'].mean(), df['y'].mean()]
_map = folium.Map(location=location, zoom_start=12, tiles="Stamen Toner")
for i in range(len(df)):
    folium.Circle(radius=150, location=[df.loc[i, 'x'], df.loc[i, 'y']], color=df.loc[i, 'color'], fill_color=df.loc[i, 'color'], fill=True,).add_to(_map)
#_map = mapping_stations(_map)

#Print value range
print('{}% : - {:.2f}'.format(int((1)/num_color*100), df['rent'].quantile((1)/num_color)))
for i in range(1, num_color):
    print('{}% : {:.2f} - {:.2f}'.format(int((i+1)/num_color*100), df['rent'].quantile((i)/num_color), df['rent'].quantile((i+1)/num_color)))

#For anaconda
_map

#For ordinary python
#_map.save('map.html')

As described in the "Directory structure" part, the "#Data acquisition" part is based on the above directory structure, so please change it. If there is only one output file

#Data acquisition
path = [path to file]
df = pd.read_csv(path, names=['name', 'address', 'age', 'height', 'rent', 'kinds', 'area', 'floor', 'x', 'y'])

I think it's okay.

You can change the number of colors by selecting the one you like.

_map = mapping_stations (_map) is a function that maps station information on the Yamanote Line in Tokyo. You can output by deleting the comment out. (Output example below) スクリーンショット 2020-07-10 19.19.13.png

In the case of anaconda, it can be displayed inline, so it can be output with _map. If you normally execute it with python, it cannot be displayed inline, so you can display it by saving once with _map.save ('map.html').

Summary

I tried to map the apartment information on the map. Mapping is very useful because it allows you to visually capture information. You can see more overall correlation by scraping more and expanding the mapping range. In addition, various information such as room size and hierarchy is obtained by scraping, so I would like to try data analysis with pandas as well.

Map rent information on a map with python

Execution result

Execution environment

Coordinate transformation

mapping

Coordinate transformation

Implementation code

get_zahyo.py

input.csv(part)

output.csv(part)

Caution

mapping

Directory structure

Implementation code

mapping.py

Summary

`get_zahyo.py`

`input.csv(part)`

`output.csv(part)`

`mapping.py`