Map rent information on a map with python

I got the rent information from suumo in previous article. I would like to map the acquired information on a map so that I can see the correlation between location and rent.

Execution result

The result of the previous execution is shown. You can get the following output. スクリーンショット 2020-07-10 18.41.00.png This figure is a color-coded mapping of the locations of apartments in the 23 wards of Tokyo by price. The lighter the color, the cheaper it is, and the darker it is, the more expensive it is. The colors are divided into 4 levels, each of which is listed in the following range. 25%: ~ 85,900 yen 50%: 8.59 ~ 106,200 yen 75%: 10.62 ~ 136,300 yen 100%: 13.63 to 1.9 million yen The first quartile, the second quartile, the third quartile, and the fourth quartile are calculated, respectively. The colors can be color-coded up to 12 levels as well as 4 levels.

Overall, you can see that the rent is cheaper from the city center to the southwest. Also, the east and north are relatively cheaper than other places. It may be obvious for people living in Tokyo, but it was a surprising discovery because I didn't know anything about Tokyo. By mapping in this way, you can visually understand the location and rent information. If you expand the range of information acquisition, you will be able to see the correlation in a wider range.

Execution environment

You need to convert the address to longitude / latitude for mapping. Since I want to draw the mapping result immediately and see it, I performed coordinate transformation and mapping in different environments.

Coordinate transformation

mapping

To be honest, I don't think it's a problem to go all together in the same environment.

Coordinate transformation

In order to perform mapping, it is first necessary to change the acquired address to coordinates (longitude / latitude). Obtaining the coordinates of the place from the address or name in this way is called "geocoding". It seems that geocoding can also be done using APIs such as Google and yahoo, but I stopped because registration seems to be troublesome. I heard that there is an API that can be geocoded for free when I'm looking for various things, so I used that. The API used is geocoding.jp. When I looked it up, there were various implementation examples, so I created the code with reference to them.

Implementation code

The code I created is below.

get_zahyo.py


import requests
from bs4 import BeautifulSoup
import time
import csv

def get_lat_lon_from_address(address):
    url = 'http://www.geocoding.jp/api/'
    latlons = []
    payload = {'q': address}
    r = requests.get(url, params=payload)
    ret = BeautifulSoup(r.content,'lxml')
    if ret.find('error'):
        raise ValueError("Invalid address submitted. {address}")
    else:
        x = ret.find('lat').string
        y = ret.find('lng').string
        time.sleep(10)
    return x, y

input_path = 'input.csv'
output_path = 'output.csv'

f1 = open(input_path)
reader = csv.reader(f1)

f2 = open(output_path, 'a')
writer = csv.writer(f2)

zahyo_d = {}
for row in reader:
    address = row[1]
    if address in zahyo_d:
        print('skip')
        x = zahyo_d[address][0]
        y = zahyo_d[address][1]
    else:
        print('get zahyo...')
        x, y = get_lat_lon_from_address(address)
        zahyo_d[address] = [x, y]
    row.append(x)
    row.append(y)
    writer.writerow(row)

Enter the csv file name obtained in Previous article in input_path, and enter the file name you like in output_path.

In order to reduce the number of accesses, the address and coordinates once obtained are saved in a dictionary, and when the same address as the previously searched address comes, it is pulled from the saved dictionary. Even so, the number of accesses will increase, so it is necessary to open the access time interval with time.sleep (10) so as not to burden the other server.

Below are examples of input.csv and output.csv.

input.csv(part)


La Tour Kagurazaka,Nishigokencho, Shinjuku-ku, Tokyo,18 years old,2 basement and 24 stories above ground,430,000 yen,2LDK,79.7m2,16th floor
La Tour Kagurazaka,Nishigokencho, Shinjuku-ku, Tokyo,18 years old,2 basement and 24 stories above ground,57.80,000 yen,3LDK,103.4m2,16th floor
Pearl Hakusan,4 Hakusan, Bunkyo-ku, Tokyo,36 years old,8 stories,8.30,000 yen,3K,42m2,8th Floor
River City 21 East Towers II,Tsukuda 2, Chuo-ku, Tokyo,20 years old,2 basement and 43 stories above ground,13.90,000 yen,1LDK,44.2m2,9th floor

output.csv(part)


La Tour Kagurazaka,Nishigokencho, Shinjuku-ku, Tokyo,18 years old,2 basement and 24 stories above ground,430,000 yen,2LDK,79.7m2,16th floor,35.706903,139.737421
La Tour Kagurazaka,Nishigokencho, Shinjuku-ku, Tokyo,18 years old,2 basement and 24 stories above ground,57.80,000 yen,3LDK,103.4m2,16th floor,35.706903,139.737421
Pearl Hakusan,4 Hakusan, Bunkyo-ku, Tokyo,36 years old,8 stories,8.30,000 yen,3K,42m2,8th Floor,35.721231,139.746682
River City 21 East Towers II,Tsukuda 2, Chuo-ku, Tokyo,20 years old,2 basement and 43 stories above ground,13.90,000 yen,1LDK,44.2m2,9th floor,35.668253,139.786297

It is ok if the longitude and latitude are added.

Caution

Even if you search the coordinates from the address, there are some addresses where the coordinates cannot be obtained and an error occurs. Since I wrote down the address where the error occurred, it is recommended to omit the line with these addresses. It may be a good idea to save a dictionary of addresses and coordinates in case of an error, which does not burden the server on the other side and shortens the execution time.

Address where the error occurred

Inner city
1 Kanda Sudacho, Chiyoda-ku, Tokyo
2 Iwamotocho, Chiyoda-ku, Tokyo
1 Kanda Ogawamachi, Chiyoda-ku, Tokyo
3 Kanda Surugadai, Chiyoda-ku, Tokyo

East 23 wards
3 Kameari, Katsushika-ku, Tokyo
Ohanajaya 2 in Katsushika-ku, Tokyo

West 23 wards
3 Numabukuro, Nakano-ku, Tokyo
3 Nogata, Nakano-ku, Tokyo

mapping

Since we were able to convert the address to coordinates, we will map it on the map based on the coordinate information. I used folium as a mapping library.

Directory structure

Before explaining the implementation code, I will explain the directory structure when the coordinate conversion is completed. In order to divide by range, I divide the files for each part of Tokyo's 23 wards as follows. The "#data acquisition" part in the implementation code is implemented on the assumption that this directory structure is used. Please change the "#data acquisition" part to suit your file structure.

%ls
mapping.ipynb  output_center.csv  output_east.csv  output_north.csv  output_south.csv  output_west.csv

Implementation code

The code I created is below.

mapping.py


import folium
import pandas as pd

#Mapping stations on the Yamate line
def mapping_stations(_map):
    locations_station = [[35.681382, 139.76608399999998],
[35.675069, 139.763328],
[35.665498, 139.75964],
[35.655646, 139.756749],
[35.645736, 139.74757499999998],
[35.630152, 139.74044000000004],
[35.6197, 139.72855300000003],
[35.626446, 139.72344399999997],
[35.633998, 139.715828],
[35.64669, 139.710106],
[35.658517, 139.70133399999997],
[35.670168, 139.70268699999997],
[35.683061, 139.702042],
[35.690921, 139.70025799999996],
[35.701306, 139.70004399999993],
[35.712285, 139.70378200000005],
[35.721204, 139.706587],
[35.728926, 139.71038],
[35.731401, 139.72866199999999],
[35.733492, 139.73934499999996],
[35.736489, 139.74687500000005],
[35.738062, 139.76085999999998],
[35.732135, 139.76678700000002],
[35.727772, 139.770987],
[35.720495, 139.77883700000007],
[35.713768, 139.77725399999997],
[35.707438, 139.774632],
[35.698683, 139.77421900000002],
[35.69169, 139.77088300000003]]
    for l in locations_station:
        folium.Circle(radius=10, location=l, color='blue').add_to(_map)
    return _map

#Get data
names = ['center', 'east', 'south', 'west', 'north']
df_list = []
for n in names:
    path = 'output_{}.csv'.format(n)
    df_list.append(pd.read_csv(path, names=['name', 'address', 'age', 'height', 'rent', 'kinds', 'area', 'floor', 'x', 'y']))
df = pd.concat(df_list)

#Convert rent to numbers
df['rent'] = df['rent'].str.strip('Ten thousand yen').astype(float)

#Processing of the same address
address = df['address'].unique()
new_df = []
for adr in address:
    df_adr = df.loc[df['address']==adr]
    value = df_adr['rent'].mean()
    new_df.append([value, df_adr.iloc[0, 8], df_adr.iloc[0, 9]])
df = pd.DataFrame(new_df, columns=['rent', 'x', 'y'])

#color decision
#colors = ['#fff4f4', '#ffeaea', '#ffd5d5', '#ffaaaa', '#ff8080', '#ff5555', '#ff2b2b', '#ff0000', '#d50000', '#aa0000', '#800000', '#550000']
#colors = ['#fff4f4', '#ffd5d5', '#ff8080', '#ff2b2b', '#d50000', '#800000']
colors = ['#ffd5d5', '#ff5555', '#d50000', '#550000']
num_color = len(colors)
df.loc[df['rent']<df['rent'].quantile(1/num_color), 'color'] = colors[0]
for i in range(1, num_color-1):
    df.loc[(df['rent'].quantile(i/num_color) <= df['rent']) & (df['rent'] < df['rent'].quantile((i+1)/num_color)), 'color'] = colors[i]
df.loc[df['rent']>=df['rent'].quantile((num_color-1)/num_color), 'color'] = colors[-1]

#mapping
location = [df['x'].mean(), df['y'].mean()]
_map = folium.Map(location=location, zoom_start=12, tiles="Stamen Toner")
for i in range(len(df)):
    folium.Circle(radius=150, location=[df.loc[i, 'x'], df.loc[i, 'y']], color=df.loc[i, 'color'], fill_color=df.loc[i, 'color'], fill=True,).add_to(_map)
#_map = mapping_stations(_map)

#Print value range
print('{}% : - {:.2f}'.format(int((1)/num_color*100), df['rent'].quantile((1)/num_color)))
for i in range(1, num_color):
    print('{}% : {:.2f} - {:.2f}'.format(int((i+1)/num_color*100), df['rent'].quantile((i)/num_color), df['rent'].quantile((i+1)/num_color)))

#For anaconda
_map

#For ordinary python
#_map.save('map.html')

As described in the "Directory structure" part, the "#Data acquisition" part is based on the above directory structure, so please change it. If there is only one output file

#Data acquisition
path = [path to file]
df = pd.read_csv(path, names=['name', 'address', 'age', 'height', 'rent', 'kinds', 'area', 'floor', 'x', 'y'])

I think it's okay.

You can change the number of colors by selecting the one you like.

_map = mapping_stations (_map) is a function that maps station information on the Yamanote Line in Tokyo. You can output by deleting the comment out. (Output example below) スクリーンショット 2020-07-10 19.19.13.png

In the case of anaconda, it can be displayed inline, so it can be output with _map. If you normally execute it with python, it cannot be displayed inline, so you can display it by saving once with _map.save ('map.html').

Summary

I tried to map the apartment information on the map. Mapping is very useful because it allows you to visually capture information. You can see more overall correlation by scraping more and expanding the mapping range. In addition, various information such as room size and hierarchy is obtained by scraping, so I would like to try data analysis with pandas as well.

Recommended Posts

Map rent information on a map with python
Folium: Visualize data on a map with Python
Visualize grib2 on a map with python (matplotlib)
A memo with Python2.7 and Python3 on CentOS
I made a Hex map with Python
Try drawing a map with python + cartopy 0.18.0
Build a python environment with ansible on centos6
Decrypt a string encrypted on iOS with Python
[Python, ObsPy] I drew a beach ball on the map with Cartopy + ObsPy.
I tried to draw a route map with Python
Make a breakpoint on the c layer with python
I made a Python3 environment on Ubuntu with direnv.
Information for controlling the motor with Python on RaspberryPi
Make a fortune with Python
A note on speeding up Python code with Numba
Create a directory with python
Get Alembic information with Python
What is a python map?
Create a list in Python with all followers on twitter
Building a Python environment on Mac
[Python] What is a with statement?
Solve ABC163 A ~ C with Python
Operate a receipt printer with python
A python graphing manual with Matplotlib.
Let's make a GUI with python.
Building a Python environment on Ubuntu
Solve ABC166 A ~ D with Python
Create a Python environment on Mac (2017/4)
I made a fortune with Python.
Building a virtual environment with Python 3
Solve ABC168 A ~ C with Python
Study on Tokyo Rent Using Python (3-2)
[python] Read information with Redmine API
Make a recommender system with python
Create a python environment on centos
[Python] Generate a password with Slackbot
Solve ABC162 A ~ C with Python
Notes on using rstrip with python.
Solve ABC167 A ~ C with Python
Solve ABC158 A ~ C with Python
Study on Tokyo Rent Using Python (3-3)
Let's make a graph with python! !!
Getting started with Python 3.8 on Windows
Build a python3 environment on CentOS7
[Python] Inherit a class with class variables
I made a daemon with Python
Get weather information with Python & scraping
Easily draw a map with matplotlib.basemap
Write a batch script with Python3.5 ~
[Memo] Tweet on twitter with python
[Python] Visualize overseas Japanese soccer players on a map as of 2021.1.1
Control the motor with a motor driver using python on Raspberry Pi 3!
A note on what you did to use Flycheck with Python
Build a 64-bit Python 2.7 environment with TDM-GCC and MinGW-w64 on Windows 7
Build a Python environment on your Mac with Anaconda and PyCharm
Visualize prefectures with many routes by prefecture on a Japanese map
Get data from MySQL on a VPS with Python 3 and SQLAlchemy
Try to display google map and geospatial information authority map with python
[Pyenv] Building a python environment with ubuntu 16.04
Collecting information from Twitter with Python (Twitter API)
Get property information by scraping with python