This is an article to find the affine matrix in feature point matching. In this article, we will decide the feature points manually. The purpose is to find the affine matrix A that converts image1 to image2 with the feature points of the two images known like this.
If you can find the affine matrix from the feature points, you can overlay the images like this.
Matrix calculation to transform each coordinate of the image
\left(
\begin{array}{c}
x^{'}\\
y^{'}\\
1
\end{array}
\right)
=
\left(
\begin{array}{ccc}
a & b & t_{x}\\
c & d & t_{y}\\
0 & 0 & 1
\end{array}
\right)
\left(
\begin{array}{c}
x\\
y\\
1
\end{array}
\right)
of
A=
\left(
\begin{array}{ccc}
a & b & t_{x}\\
c & d & t_{y}\\
0 & 0 & 1
\end{array}
\right)
The part of is an affine matrix. An excellent one that can represent rotation, enlargement, reduction, movement, inversion, and shear of an image with this matrix alone! !!
For the affine transformation, I referred to the following article. Completely understand affine transformation Affine transformation by matrix (scaling, rotation, shearing, movement) -Reinventor of Python image processing-
When there are $ N (N \ geqq3) $ feature points in two images, the coordinates of the feature points in the image before conversion are calculated.
\left(
\begin{array}{c}
x_{n}\\
y_{n}
\end{array}
\right)
Coordinates after return
\left(
\begin{array}{c}
x^{'}_{n}\\
y^{'}_{n}
\end{array}
\right)
The determinant that transforms all $ N $ coordinates as affine
\left(
\begin{array}{c}
x^{'}_{1}&x^{'}_{2}&\cdots&x^{'}_{N}\\
y^{'}_{1}&x^{'}_{2}&\cdots&x^{'}_{N}\\
1&1&\cdots&1
\end{array}
\right)
=
\left(
\begin{array}{ccc}
a & b & t_{x}\\
c & d & t_{y}\\
0 & 0 & 1
\end{array}
\right)
\left(
\begin{array}{c}
x_{1}&x_{2}&\cdots&x_{N}\\
y_{1}&x_{2}&\cdots&x_{N}\\
1&1&\cdots&1
\end{array}
\right)
It can be represented by. The purpose is to find this $ a, b, c, d, t_ {x}, t_ {y} $. Here, a set of coordinates before and after conversion
\left(
\begin{array}{c}
x_{n}\\
y_{n}
\end{array}
\right)
,
\left(
\begin{array}{c}
x^{'}_{n}\\
y^{'}_{n}
\end{array}
\right)
Affine transformation against
\left(
\begin{array}{c}
x^{'}_{n}\\
y^{'}_{n}\\
1
\end{array}
\right)
=
\left(
\begin{array}{ccc}
a & b & t_{x}\\
c & d & t_{y}\\
0 & 0 & 1
\end{array}
\right)
\left(
\begin{array}{c}
x_{n}\\
y_{n}\\
1
\end{array}
\right)
When you expand
\begin{align}
x^{'}_{n}&=ax_{n} + by_{n} + t_{x}\\
y^{'}_{n}&=cx_{n} + dy_{n} + t_{y}
\end{align}
Will be.
w_1=
\left(
\begin{array}{c}
a\\
b\\
t_{x}
\end{array}
\right)
,
w_2=
\left(
\begin{array}{c}
c\\
d\\
t_{y}
\end{array}
\right)
,
p_{n}=
\left(
\begin{array}{c}
x_{n} & y_{n} & 1
\end{array}
\right)
,
p^{'}_{n}=
\left(
\begin{array}{c}
x^{'}_{n} & y^{'}_{n} & 1
\end{array}
\right)
If you prepare a vector like
\begin{align}
x^{'}_{n}&=p_{n}w_1\\
y^{'}_{n}&=p_{n}w_2
\end{align}
Can be written. The distance between the coordinates of the conversion destination and the coordinates after the return by the affine transformation is used as an error function, and $ w_1 $ and $ w_2 $ when the error function is the smallest are obtained. Error function $ E $
E=\sum_{n=1}^{N}
\Bigl(
(x^{'}_{n} - p_{n}w_1)^{2} + (y^{'}_{n} - p_{n}w_2)^{2}
\Bigr)
To set this to represent this in matrix format
X^{'}=
\left(
\begin{array}{c}
x^{'}_{1}\\
\vdots\\
x^{'}_{N}
\end{array}
\right)
,
Y^{'}=
\left(
\begin{array}{c}
y^{'}_{1}\\
\vdots\\
y^{'}_{N}
\end{array}
\right)
,
P=
\left(
\begin{array}{c}
p_{1}\\
\vdots\\
p_{N}
\end{array}
\right)
=
\left(
\begin{array}{c}
x_{1} & y_{2} & 1\\
&\vdots&\\
x_{N} & y_{N} & 1
\end{array}
\right)
given that
E=
(X^{'} - Pw_1)^{T}(X^{'} - Pw_1) + (Y^{'} - Pw_2)^T(Y^{'} - Pw_2)
Can be written. When unfolded
\begin{align}
E&=({X^{'}}^{T} - (Pw_1)^{T})(X^{'} - Pw_1) + ({Y^{'}}^{T} - (Pw_2)^{T})(Y^{'}-Pw_2)\\
&={X^{'}}^{T}X^{'} - {X^{'}}^{T}Pw_1 - (Pw_1)^{T}X^{'} + (Pw_1)^{T}(Pw_1) + {Y^{'}}^{T}Y^{'} - {Y^{'}}^{T}Pw_2 - (Pw_2)^{T}Y^{'} + (Pw_2)^{T}(Pw_2)\\
&={X^{'}}^{T}X^{'} - w^{T}_{1}P^{T}{X^{'}}^{T} - w^{T}_{1}P^{T}{X^{'}}^{T} + w^{T}_{1}P^{T}Pw_{1} + {Y^{'}}^{T}Y^{'} - w^{T}_{2}P^{T}{Y^{'}}^{T} - w^{T}_{2}P^{T}{Y^{'}}^{T} + w^{T}_{2}P^{T}Pw_{2}\\
&={X^{'}}^{T}X^{'} - 2w^{T}_{1}P^{T}{X^{'}}^{T} + w^{T}_{1}P^{T}Pw_{1} + {Y^{'}}^{T}Y^{'} - 2w^{T}_{2}P^{T}{Y^{'}}^{T} + w^{T}_{2}P^{T}Pw_{2}\\
\end{align}
It will be. If you find the time when $ E $ becomes smaller by partial differentiation with $ w_1 $ and $ w_2 $
\begin{align}
\frac{\partial E}{\partial w_{1}}=-2P^{T}X^{'} + 2P^{T}Pw_{1}&=0\\
2P^{T}w_{1}&=2P^{T}X^{'}\\
w_{1}&=(P^{T}P)^{-1}P^{T}X^{'}\\
\frac{\partial E}{\partial w_{2}}=-2P^{T}Y^{'} + 2P^{T}Pw_{2}&=0\\
2P^{T}w_{2}&=2P^{T}Y^{'}\\
w_{2}&=(P^{T}P)^{-1}P^{T}Y^{'}
\end{align}
As a result, $ w_1 $ and $ w_2 $ were obtained, so the affine matrix was obtained.
Let's implement it in Python using only numpy.
import numpy as np
import math
from PIL import Image
from matplotlib import pyplot as plt
#A function that refers to the end of the array for those who exceed the range of the reference image
def clip_xy(ref_xy, img_shape):
#Replace for x coordinate
ref_x = np.where((0 <= ref_xy[:, 0]) & (ref_xy[:, 0] < img_shape[1]), ref_xy[:, 0], -1)
#Replace for y coordinate
ref_y = np.where((0 <= ref_xy[:, 1]) & (ref_xy[:, 1] < img_shape[0]), ref_xy[:, 1], -1)
#Combine and return
return np.vstack([ref_x, ref_y]).T
#Affine transformation
def affine(data, affine, draw_area_size):
# data:Image data to be converted to affine
# affine:Affine matrix
#:draw_area_size:It may be the same as or better than the shape of data
#Inverse matrix of affine matrix
inv_affine = np.linalg.inv(affine)
x = np.arange(0, draw_area_size[1], 1)
y = np.arange(0, draw_area_size[0], 1)
X, Y = np.meshgrid(x, y)
XY = np.dstack([X, Y, np.ones_like(X)])
xy = XY.reshape(-1, 3).T
#Calculation of reference coordinates
ref_xy = inv_affine @ xy
ref_xy = ref_xy.T
#Coordinates around the reference coordinates
liner_xy = {}
liner_xy['downleft'] = ref_xy[:, :2].astype(int)
liner_xy['upleft'] = liner_xy['downleft'] + [1, 0]
liner_xy['downright'] = liner_xy['downleft'] + [0, 1]
liner_xy['upright'] = liner_xy['downleft'] + [1, 1]
#Weight calculation with linear interpolation
liner_diff = ref_xy[:, :2] - liner_xy['downleft']
liner_weight = {}
liner_weight['downleft'] = (1 - liner_diff[:, 0]) * (1 - liner_diff[:, 1])
liner_weight['upleft'] = (1 - liner_diff[:, 0]) * liner_diff[:, 1]
liner_weight['downright'] = liner_diff[:, 0] * (1 - liner_diff[:, 1])
liner_weight['upright'] = liner_diff[:, 0] * liner_diff[:, 1]
#Weight and add
liner_with_weight = {}
for direction in liner_weight.keys():
l_xy = liner_xy[direction]
l_xy = clip_xy(l_xy, data.shape)
l_xy = np.dstack([l_xy[:, 0].reshape(draw_area_size), l_xy[:, 1].reshape(draw_area_size)])
l_weight = liner_weight[direction].reshape(draw_area_size)
liner_with_weight[direction] = data[l_xy[:, :, 1], l_xy[:, :, 0]] * l_weight
data_linear = sum(liner_with_weight.values())
return data_linear
#Function to obtain affine matrix from feature points
def registration(P, x_dash, y_dash):
w1 = np.linalg.inv(P.T @ P) @ P.T @ x_dash
w2 = np.linalg.inv(P.T @ P) @ P.T @ y_dash
affine_matrix = np.array([[1.0, 0.0, 0.0],
[0.0, 1.0, 0.0],
[0.0, 0.0, 1.0]])
affine_matrix[0, :] = w1
affine_matrix[1, :] = w2
print(affine_matrix)
return affine_matrix
#Clicked feature point Save array
future_points1 = np.array([[1, 1]])
future_points2 = np.array([[1, 1]])
count_fp1 = 0
count_fp2 = 0
#Click to determine feature points
def onclick(event):
global future_points1
global future_points2
global count_fp1
global count_fp2
click_axes = event.inaxes
x = math.floor(event.xdata)
y = math.floor(event.ydata)
if click_axes == ax1:
if count_fp1 == 0:
future_points1[0, :] = (x, y)
count_fp1 = 1
else:
future_points1 = np.vstack([future_points1, np.array([x, y])])
count_fp1 += count_fp1
print(future_points1)
if click_axes == ax2:
if count_fp2 == 0:
future_points2[0, :] = (x, y)
count_fp2 = 1
else:
future_points2 = np.vstack([future_points2, np.array([x, y])])
count_fp2 += count_fp2
print(future_points2)
click_axes.scatter(x, y)
fig.canvas.draw_idle()
#Enter male and image overlay
def onEnter(event):
if event.key == 'enter' and future_points1.size == future_points2.size and future_points1.size >= 3:
# P:Coordinate matrix of conversion source([[x,y,1],[x,y,1],...]
# x_dash:Destination x coordinate vector
# y_dash:Destination y coordinate vector
vec_one = np.ones((future_points2.shape[0], 1))
P = np.hstack([future_points2, vec_one])
x_dash = future_points1[:, 0]
y_dash = future_points1[:, 1]
affine_matrix = registration(P, x_dash, y_dash)
#Obtain the image after affine transformation
affined_image = affine(image2, affine_matrix, image1.shape)
x = np.arange(0, affined_image.shape[1], 1)
y = np.arange(0, affined_image.shape[0], 1)
X_affined, Y_affined = np.meshgrid(x, y)
ax3.pcolormesh(X_affined, Y_affined, affined_image, cmap='gray', shading='auto', alpha=0.2)
fig.canvas.draw_idle()
#Image loading
image1 = np.array(Image.open('./source/test1.jpg').convert('L'))
image2 = np.array(Image.open('./source/t_test1.jpg').convert('L'))
#Bg at the end of the image_color addition of color
bg_color = 256
image2 = np.hstack([image2, bg_color * np.ones((image2.shape[0], 1), int)])
image2 = np.vstack([image2, bg_color * np.ones((1, image2.shape[1]), int)])
x_image1 = np.arange(0, image1.shape[1], 1)
y_image1 = np.arange(0, image1.shape[0], 1)
X1, Y1 = np.meshgrid(x_image1, y_image1)
x_image2 = np.arange(0, image2.shape[1], 1)
y_image2 = np.arange(0, image2.shape[0], 1)
X2, Y2 = np.meshgrid(x_image2, y_image2)
fig = plt.figure(figsize=(8, 8))
ax1 = fig.add_subplot(221)
mesh1 = ax1.pcolormesh(X1, Y1, image1, shading='auto', cmap='gray')
ax1.invert_yaxis()
ax2 = fig.add_subplot(223)
mesh2 = ax2.pcolormesh(X2, Y2, image2, shading='auto', cmap='gray')
ax2.invert_yaxis()
ax3 = fig.add_subplot(222)
mesh3 = ax3.pcolormesh(X1, Y1, image1, shading='auto', cmap='gray', alpha=0.2)
ax3.invert_yaxis()
cid = fig.canvas.mpl_connect('button_press_event', onclick)
cid = fig.canvas.mpl_connect('key_press_event', onEnter)
plt.show()
Looking at the formula, it is similar to the formula for finding the coefficient of linear regression! !! Or rather, there is almost no difference in doing linear regression. .. .. It would be interesting if we could create feature vectors well like linear regression, or deal with image distortion by doing well with Gaussian process methods.
Please point out any mistakes or confusing points.
Recommended Posts