Basically, it is well known that deep learning does not have accuracy unless the number of data is large to some extent, but it is metric learning
that accuracy is obtained even if the number of N is small.
Please refer to the following articles for detailed explanations and implementation methods.
Commentary: Modern deep metric learning method: SphereFace, CosFace, ArcFace keras implementation: [Keras] I tried to classify PET bottles using MobileNetV2 + ArcFace!
Arcface, which is one of metric learning, uses the feature vector after Global Average Pooling of CNN at the time of inference. Classification is performed by calculating the cos similarity (angle between multidimensional vectors) with the evaluation data based on the representative feature vector of the training data. Since it is cos, it takes a value from -1 to 1, and if it is 1, the angle is 0, that is, the angle formed by the two vectors is 0, and it can be said that the two feature vectors are similar. Therefore, if it is a 3-class classification, it is a way to prepare a representative vector for each class and output the one with the highest cos similarity as a predicted value.
By the way, after learning the deep learning model, it is possible to calculate with an edge PC and realize what you want to do, but if you use GPU and infer multiple sheets, it is better to infer multiple sheets at once Of course, the calculation time of is faster. I don't think it's that difficult to batch infer multiple sheets at once (I don't know if this word is correct ...) with normal multi-class classification. However, in the case of cos similarity, a little ingenuity is required, so I will write the following.
① How to batch infer cos similarity used in arcface inference (2) Incorporate cos similarity into keras (tensorflow) by combining the keras model and backend
that's all. Please note that the learning of arcface itself and the implementation to obtain the representative vector are skipped.
tensorflow 1.15.0
keras 2.3.1
Python 3.7.6
numpy 1.18.1
core i7
GTX1080ti
keras implementation: [Keras] I tried to classify PET bottles using MobileNetV2 + ArcFace! I will refer to.
This is a function for calculating the prerequisite cos similarity.
cosine_similarity.py
def cosine_similarity(x1, x2):
if x1.ndim == 1:
x1 = x1[np.newaxis]
if x2.ndim == 1:
x2 = x2[np.newaxis]
x1_norm = np.linalg.norm(x1, axis=1)
x2_norm = np.linalg.norm(x2, axis=1)
cosine_sim = np.dot(x1, x2.T)/(x1_norm*x2_norm+1e-10)
return cosine_sim
The first method is to find the cos similarity one by one.
backend_result.py
from keras.applications.xception import Xception
from keras.preprocessing.image imload_img, img_to_array
import time
import os
imgpath = "A path containing multiple suitable images"
#Model definition
#Temporarily make a model with GAP attached to xception
#The output is(N, 2048)
input_tensor = Input(shape=(299, 299, 3))
xception_model = Xception(include_top=False, weights='imagenet', input_tensor=input_tensor)
x = xception_model.output
outputs = GlobalAveragePooling2D()(x)
model = Model(inputs=input_tensor, outputs=outputs)
#Define a tentative representative vector → 1 image vector for each of 10 classes
#This time, define all 1 vectors appropriately.
master_vector = np.ones([10, 2048])
#List of image names
#It's okay to use glob
img_name_list = os.listdir(imgpath)
for i in imglist:
absname = os.path.join(imgpath, i)
img_rgb = load_img(absname, color_mode='rgb', target_size=(299,299))
img_rgb = img_to_array(img_rgb)
img_rgb = np.expand_dims(img_rgb, axis=0)
pred_vector = model.predict(img_rgb)
#Calculate cos similarity
#pred_vector=(1, 2048)×master_vector=(10, 2048)→(1, 10)
cos_sim = cosine_similarity(pred_vector, master_vector)
pred_class = np.argmax(cos_sim[0])
print(pred_class)
The next method is to find the cos similarity for multiple images at once.
Let's change the function that calculates the cos similarity.
The change is that when calculating the inner product, it is first divided by the squared norm of each vector and then calculated.
When calculating the squared norm with np.linalg.norm
, keepdims = True
is set, so for example
Feature vector shape for 10 images: (10, 2048)
In the case of
Shape of squared norm of feature vector for 10 images: (10, 1)
And you will be able to divide normally. (After that, add a small value so that it is not divided by 0) With this method, it is possible to calculate regardless of the shape of each feature vector. (If you are interested, please try while checking the shape of the vector with shape)
cosine_similarity_batch.py
def cosine_similarity_batch(x1, x2):
if x1.ndim == 1:
x1 = x1[np.newaxis]
if x2.ndim == 1:
x2 = x2[np.newaxis]
x1_norm = np.linalg.norm(x1, axis=1, keepdims=True)
x2_norm = np.linalg.norm(x2, axis=1, keepdims=True)
#First divide by the square norm (vector length) of each feature vector and then calculate the inner product
cosine_sim = np.dot(x1/(x1_norm+1e-10), (x2/(x2_norm+1e-10)).T)
return cosine_sim
After that, you can use this to infer in batch and then calculate with the above function.
backend_result.py
from keras.applications.xception import Xception
from keras.preprocessing.image imload_img, img_to_array
import time
import os
imgpath = "A path containing multiple suitable images"
#Model definition
#Temporarily make a model with GAP attached to xception
#The output is(N, 2048)
input_tensor = Input(shape=(299, 299, 3))
xception_model = Xception(include_top=False, weights='imagenet', input_tensor=input_tensor)
x = xception_model.output
outputs = GlobalAveragePooling2D()(x)
model = Model(inputs=input_tensor, outputs=outputs)
#Define a tentative representative vector → 1 image vector for each of 10 classes
#This time, define all 1 vectors appropriately.
master_vector = np.ones([10, 2048])
#List of image names
#It's okay to use glob
img_name_list = os.listdir(imgpath)
#Put it in an empty list with a for statement and make it a numpy array
imgs = []
for i in imglist:
absname = os.path.join(imgpath, i)
img_rgb = load_img(absname, color_mode='rgb', target_size=(299,299))
img_rgb = img_to_array(img_rgb)
imgs.append(img_rgb)
imgs = np.array(imgs, np.float32)
#Infer and calculate with the previous function
#Calculate cos similarity
#pred_vector=(N sheets, 2048)×master_vector=(10, 2048)→(N, 10)
pred_vector = model.predict(imgs)
cos_sim = cosine_similarity_batch(pred_vector, master_vecter)
#Since 10 cos similarity comes out for the number of batches, argmax is axis=Run on 1
pred_class = np.argmax(cos_sim, axis=1)
print(pred_class)
After that, you can boil or bake the result and like it.
This is the main subject. Since we are using keras (tensorflow), which is good at calculating the same processing for multiple images at once, it will be faster to incorporate the calculation with numpy after inference into the model. It is a plan. That said, it's not that difficult because you just need to reimplement what you're doing with numpy in the keras backend.
cosine_simlarity_kerasbackend.py
def cosine_similarity_eval(args):
x1, x2 = args
x2 = K.constant(x2)
x1_norm = K.sqrt(K.sum(K.square(x1), axis=1, keepdims=True))
x2_norm = K.sqrt(K.sum(K.square(x2), axis=1, keepdims=True))
cosine_sim = K.dot(x1/(x1_norm+1e-10), K.transpose(x2/(x2_norm+1e-10)))
return cosine_sim
Below are the differences from the numpy implementation.
-The representative vector is prepared by numpy and converted to a constant tensor in this function.
-Since the calculation of the squared norm did not seem to have the same function as numpy, it is forcibly implemented.
(For the sake of clarity, I combined it with the numpy implementation earlier, but if you use l2_normalize
, you can write it in one line)
Next is how to connect with the keras model.
cosine_simlarity_kerasbackend.py
#Defines a model that outputs a feature vector
input_tensor = Input(shape=(299, 299, 3))
xception_model = Xception(include_top=False, weights='imagenet', input_tensor=input_tensor)
x = xception_model.output
outputs = GlobalAveragePooling2D()(x)
model = Model(inputs=input_tensor, outputs=outputs)
#Define a tentative representative vector → 1 image vector for each of 10 classes
#This time, define all 1 vectors appropriately.
master_vector = np.ones([10, 2048])
#Input the tensor of the feature vector and the representative vector, which are the outputs of the model.
cosine_sim = cosine_similarity_eval([model.output, master_vecter])
If you do this, the calculation graph of tensorflow that outputs cosine sim with model input as input is completed. After that, use this to fill in the data with sess.run and you're done. Below is the entire code.
cosine_simlarity_kerasbackend.py
from keras.applications.xception import Xception
from keras.preprocessing.image imload_img, img_to_array
from keras import backend as K
import time
import os
#Define session
sess = K.get_session()
imgpath = "A path containing multiple suitable images"
#Model definition
#Temporarily make a model with GAP attached to xception
#The output is(N, 2048)
input_tensor = Input(shape=(299, 299, 3))
xception_model = Xception(include_top=False, weights='imagenet', input_tensor=input_tensor)
x = xception_model.output
outputs = GlobalAveragePooling2D()(x)
model = Model(inputs=input_tensor, outputs=outputs)
#Input the tensor of the feature vector and the representative vector, which are the outputs of the model.
cosine_sim = cosine_similarity_eval([model.output, master_vecter])
#Define a tentative representative vector → 1 image vector for each of 10 classes
#This time, define all 1 vectors appropriately.
master_vector = np.ones([10, 2048])
#List of image names
#It's okay to use glob
img_name_list = os.listdir(imgpath)
#Put it in an empty list with a for statement and make it a numpy array
imgs = []
for i in imglist:
absname = os.path.join(imgpath, i)
img_rgb = load_img(absname, color_mode='rgb', target_size=(299,299))
img_rgb = img_to_array(img_rgb)
imgs.append(img_rgb)
imgs = np.array(imgs, np.float32)
cos_sim = sess.run(
cosine_sim,
feed_dict={
model.input: imgs,
K.learning_phase(): 0
})
#Since 10 cos similarity comes out for the number of batches, argmax is axis=Run on 1
#It is OK to incorporate this into the model as well
pred_class = np.argmax(cos_sim, axis=1)
print(pred_class)
I will put it for reference It is the time from image loading to inference.
Method | time[s/100 sheets] |
---|---|
Infer one by one | 1.37 |
Batch inference with numpy | 0.72 |
Batch inference with keras | 0.66 |
It's a little fast. This method using the keras backend can be applied in quite a lot of places, so I think it's worth remembering.
that's all. If you have any questions or concerns, please leave a comment.