This time, I would like to make geotag prediction from a building image using a trained model. Especially in this article, I started with the purpose of using the output of multiple labels of latitude and longitude from the input image.
The data used this time is "European Cities 1M dataset" http://image.ntua.gr/iva/datasets/ec1m/index.html
Use the landmark set image and geotag on this site respectively.
The implementation in this article uses Google Colaboratory. The settings of the environment used are listed below.
import keras
from keras.utils import np_utils
from keras.models import Sequential, Model, load_model
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.layers.core import Dense, Dropout, Activation, Flatten
import numpy as np
from sklearn.model_selection import train_test_split
import os, zipfile, io, re
from PIL import Image
import glob
from tqdm import tqdm
from sklearn.model_selection import train_test_split
from keras.applications.xception import Xception
from keras.applications.resnet50 import ResNet50
from keras.layers.pooling import GlobalAveragePooling2D
from keras.optimizers import Adam, Nadam
Here, as pre-processing, processing with latitude and longitude labels is performed. This time, we will learn the latitude and longitude separately as labels, so we will process them so that they can be easily retrieved as a list.
with open("landmark/ec1m_landmarks_geotags.txt") as f:
label=f.readlines()
for i in label:
ans=i.split(' ')
ans[1]=ans[1].replace('\n','')
print(ans)
output
['41.4134', '2.153']
['41.3917', '2.16472']
['41.3954', '2.16177']
['41.3954', '2.16177']
['41.3954', '2.16156']
['41.3899', '2.17428']
['41.3953', '2.16184']
['41.3953', '2.16172']
['41.3981', '2.1645']
.....
.....
Image size is 100 Convert dataset to array Label the image without separating the latitude and longitude
X = []
Y = []
image_size=100
with open("landmark/ec1m_landmarks_geotags.txt") as f:
label=f.readlines()
dir = "landmark/ec1m_landmark_images"
files = glob.glob(dir + "/*.jpg ")
for index,file in tqdm(enumerate(files)):
image = Image.open(file)
image = image.convert("RGB")
image = image.resize((image_size, image_size))
data = np.asarray(image)
X.append(data)
Y.append(label[index])
X = np.array(X)
Y = np.array(Y)
output
927it [00:08, 115.29it/s]
X.shape,Y.shape
((927, 100, 100, 3), (927,))
Next, divide into train, test, valid Latitude and longitude are also divided here.
y0=[] #Latitude label
y1=[] #Longitude label
for i in Y:
ans=i.split(' ')
ans[1]=ans[1].replace('\n','')
y0.append(float(ans[0]))
y1.append(float(ans[1]))
y0=np.array(y0)
y1=np.array(y1)
#x(train,test)Split
X_train, X_test = train_test_split(X, random_state = 0, test_size = 0.2)
print(X_train.shape, X_test.shape)
#(741, 100, 100, 3) (186, 100, 100, 3)
#y0,y1(train,test)Split
y_train0,y_test0,y_train1, y_test1 = train_test_split(y0,y1,
random_state = 0,
test_size = 0.2)
print(y_train0.shape, y_test0.shape)
print(y_train1.shape, y_test1.shape)
#(741,) (186,)
#(741,) (186,)
#Data type conversion & normalization
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255
#x(train,valid)Split
X_train, X_valid= train_test_split(X_train, random_state = 0, test_size = 0.2)
print(X_train.shape, X_valid.shape)
#(592, 100, 100, 3) (149, 100, 100, 3)
#y0,y1(train,valid)Split
y_train0, y_valid0,y_train1, y_valid1= train_test_split(y_train0,y_train1,
random_state = 0,
test_size = 0.2)
print(y_train0.shape, y_valid0.shape)
print(y_train1.shape, y_valid1.shape)
#(592,) (149,)
#(592,) (149,)
This time, we will use the trained model of Xception by referring to the following article. https://qiita.com/ha9kberry/items/314afb56ee7484c53e6f#データ取得
I also wanted to try other models, so I will try using Resnet as well
#xception model
base_model = Xception(
include_top = False,
weights = "imagenet",
input_shape = None
)
#resnet model
base_model = ResNet50(
include_top = False,
weights = "imagenet",
input_shape = None
)
Enter one predicted value at the end of the regression problem Prepare label output for latitude and longitude
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions1 = Dense(1,name='latitude')(x)
predictions2 = Dense(1,name='longitude')(x)
Put predictions1 and predictions2 in output. Others are as in the reference article This time, we will study using Adam and Nadam as optimizers.
model = Model(inputs = base_model.input, outputs = [predictions1,predictions2])
#Freeze up to 108 layers
for layer in model.layers[:108]:
layer.trainable = False
#Unfreeze Batch Normalization
if layer.name.startswith('batch_normalization'):
layer.trainable = True
if layer.name.endswith('bn'):
layer.trainable = True
#Learn after 109 layers
for layer in model.layers[108:]:
layer.trainable = True
# layer.compile after setting trainable
model.compile(
optimizer = Adam(),
#optimizer=Nadam(),
loss = {'latitude':root_mean_squared_error,
'longitude':root_mean_squared_error
}
)
history = model.fit( X_train, #decode_train
{'latitude': y_train0,
'longitude':y_train1},
batch_size=64,
epochs=50,
validation_data=(X_valid, decode_valid
{'latitude' :y_valid0,
'longitude':y_valid1}),
)
output
Epoch 1/50
10/10 [==============================] - 4s 409ms/step - loss: 0.6146 - latitude_loss: 0.4365 - longitude_loss: 0.1782 - val_loss: 1.6756 - val_latitude_loss: 1.3430 - val_longitude_loss: 0.3326
Epoch 2/50
10/10 [==============================] - 4s 404ms/step - loss: 0.5976 - latitude_loss: 0.4415 - longitude_loss: 0.1562 - val_loss: 0.7195 - val_latitude_loss: 0.5987 - val_longitude_loss: 0.1208
...
...
import matplotlib.pyplot as plt
plt.figure(figsize=(18,6))
# loss
plt.subplot(1, 2, 1)
plt.plot(history.history["latitude_loss"], label="latitude_loss", marker="o")
plt.plot(history.history["longitude_loss"], label="longitude_loss", marker="o")
#plt.yticks(np.arange())
#plt.xticks(np.arange())
plt.ylabel("loss")
plt.xlabel("epoch")
plt.title("")
plt.legend(loc="best")
plt.grid(color='gray', alpha=0.2)
plt.show()
result
#batch size 64 Adam
scores = model.evaluate(X_test,{'latitude' :y_test0,
'longitude':y_test1},
verbose=1)
print("total loss:\t{0}".format(scores[0]))
print("latitude loss:\t{0}".format(scores[1]))
print("longtitude loss:{0}".format(scores[2]))
output
total loss: 0.7182420492172241
latitude loss: 0.6623533964157104
longtitude loss:0.05588864907622337
# show image, prediction and actual label
for i in range(10,12):
plt.figure(figsize=(10,10))
print('latitude:{} \tlongititude{}'.format(
prediction[0][i],
prediction[1][i],
))
plt.imshow(X_test[i].reshape(100, 100, 3))
plt.show()
latitude:[39.69221] longititude[2.2188098]
latitude:[39.728386] longititude[2.224149]
There are numbers like that, but when expressed on the map (Google Map), it is in the middle of the sea as shown below, which is not enough to use.
Parameters used | Total loss | latitude_loss | longtitude_loss |
---|---|---|---|
Xception , Adam | 0.7182 | 0.6623 | 0.0558 |
Xception , Nadam | 0.3768 | 0.1822 | 0.1946 |
Resnet , Adam | 0.7848 | 0.7360 | 0.0488 |
Resnet , Nadam | 49.6434 | 47.2652 | 2.3782 |
Resnet,Adam,AutoEncoder | 1.8299 | 1.6918 | 0.13807 |
In this trial, it was found that the combination of Xception and Nadam has the highest accuracy. In the future I will use another model or create a model from scratch
data set
Publications Conferences Y. Avrithis, Y. Kalantidis, G. Tolias, E. Spyrou. Retrieving Landmark and Non-Landmark Images from Community Photo Collections. In Proceedings of ACM Multimedia (MM 2010), Firenze, Italy, October 2010.
Journals Y. Kalantidis, G. Tolias, Y. Avrithis, M. Phinikettos, E. Spyrou, P. Mylonas, S. Kollias. VIRaL: Visual Image Retrieval and Localization. In Multimedia Tools and Applications (to appear), 2011.
article
--https://qiita.com/ha9kberry/items/314afb56ee7484c53e6f # Overview
Recommended Posts