In this way, suppose you want to determine the closest point at a specified latitude / longitude from a list of tuples that represent latitude / longitude.
geo_pts = [
(35.60, 139.71),
(35.58, 139.82),
#The following is omitted
]
A naive implementation that calculates everything in the list.
import math
def find(geo_pts, target):
return max(geo_pts, key=lambda p: dist(p, target))
def dist(p1, p2):
return math.sqrt(sum((x1 - x2) ** 2 for x1, x2 in zip(p1, p2)))
This is the behavior. It took about 20ms to find about 15,000 points on AWS EC2 t2.medium instance.
print(find(geo_pts, (35.6, 139.7)))
# => (35.60, 139.71)
By the way, if you want to get n pieces, you can use heapq.nlargest
.
https://pypi.org/project/geoindex/
def main():
geo_pts = [
#Read as an array
]
index = GeoGridIndex(precision=4)
for x in geo_pts:
index.add_point(GeoPoint(x[0], x[1]))
prev = time.time()
print(find(index, (35.6, 139.7)))
# => (Point(35.687659999999994, 139.71178), 9.80482739300054)
print(time.time() - prev)
def find(index, target):
return next(index.get_nearest_points(GeoPoint(*target), 10, 'km'))
In the EC2 t2.medium instance of AWS, the value was returned in about 4ms in this implementation.
By the way, the unit of Cell Size of the argument of GeoGridIndex
is km, and if precision = 4
is specified, it can be searched within the radius of 40/2 = 20km. I haven't done any further verification as my requirements were just right with a radius of 20km.
Precision | Cell size |
---|---|
1 | 5000 |
2 | 1260 |
3 | 156 |
4 | 40 |
5 | 4.8 |
6 | 1.22 |
7 | 0.152 |
8 | 0.038 |
Recommended Posts