Introduction

This is an article to find the affine matrix in feature point matching. In this article, we will decide the feature points manually. The purpose is to find the affine matrix A that converts image1 to image2 with the feature points of the two images known like this.

If you can find the affine matrix from the feature points, you can overlay the images like this.

Feature point matching

Affine matrix

Matrix calculation to transform each coordinate of the image

\left(
\begin{array}{c}
x^{'}\\
y^{'}\\
1
\end{array}
\right)
=
\left(
\begin{array}{ccc}
a & b & t_{x}\\
c & d & t_{y}\\
0 & 0 & 1
\end{array}
\right)
\left(
\begin{array}{c}
x\\
y\\
1
\end{array}
\right)

A=
\left(
\begin{array}{ccc}
a & b & t_{x}\\
c & d & t_{y}\\
0 & 0 & 1
\end{array}
\right)

The part of is an affine matrix. An excellent one that can represent rotation, enlargement, reduction, movement, inversion, and shear of an image with this matrix alone! !!

For the affine transformation, I referred to the following article. Completely understand affine transformation Affine transformation by matrix (scaling, rotation, shearing, movement) -Reinventor of Python image processing-

Calculation of affine matrix

When there are $ N (N \ geqq3) $ feature points in two images, the coordinates of the feature points in the image before conversion are calculated.

\left(
\begin{array}{c}
x_{n}\\
y_{n}
\end{array}
\right)

Coordinates after return

\left(
\begin{array}{c}
x^{'}_{n}\\
y^{'}_{n}
\end{array}
\right)

The determinant that transforms all $ N $ coordinates as affine

\left(
\begin{array}{c}
x^{'}_{1}&x^{'}_{2}&\cdots&x^{'}_{N}\\
y^{'}_{1}&x^{'}_{2}&\cdots&x^{'}_{N}\\
1&1&\cdots&1
\end{array}
\right)
=
\left(
\begin{array}{ccc}
a & b & t_{x}\\
c & d & t_{y}\\
0 & 0 & 1
\end{array}
\right)
\left(
\begin{array}{c}
x_{1}&x_{2}&\cdots&x_{N}\\
y_{1}&x_{2}&\cdots&x_{N}\\
1&1&\cdots&1
\end{array}
\right)

It can be represented by. The purpose is to find this $ a, b, c, d, t_ {x}, t_ {y} $. Here, a set of coordinates before and after conversion

\left(
\begin{array}{c}
x_{n}\\
y_{n}
\end{array}
\right)
,
\left(
\begin{array}{c}
x^{'}_{n}\\
y^{'}_{n}
\end{array}
\right)

Affine transformation against

\left(
\begin{array}{c}
x^{'}_{n}\\
y^{'}_{n}\\
1
\end{array}
\right)
=
\left(
\begin{array}{ccc}
a & b & t_{x}\\
c & d & t_{y}\\
0 & 0 & 1
\end{array}
\right)
\left(
\begin{array}{c}
x_{n}\\
y_{n}\\
1
\end{array}
\right)

When you expand

\begin{align}
x^{'}_{n}&=ax_{n} + by_{n} + t_{x}\\
y^{'}_{n}&=cx_{n} + dy_{n} + t_{y}
\end{align}

Will be.

w_1=
\left(
\begin{array}{c}
a\\
b\\
t_{x}
\end{array}
\right)
,
w_2=
\left(
\begin{array}{c}
c\\
d\\
t_{y}
\end{array}
\right)
,
p_{n}=
\left(
\begin{array}{c}
x_{n} & y_{n} & 1
\end{array}
\right)
,
p^{'}_{n}=
\left(
\begin{array}{c}
x^{'}_{n} & y^{'}_{n} & 1
\end{array}
\right)

If you prepare a vector like

\begin{align}
x^{'}_{n}&=p_{n}w_1\\
y^{'}_{n}&=p_{n}w_2
\end{align}

Can be written. The distance between the coordinates of the conversion destination and the coordinates after the return by the affine transformation is used as an error function, and $ w_1 $ and $ w_2 $ when the error function is the smallest are obtained. Error function $ E $

E=\sum_{n=1}^{N} 
\Bigl(
(x^{'}_{n} - p_{n}w_1)^{2} + (y^{'}_{n} - p_{n}w_2)^{2}
\Bigr)

To set this to represent this in matrix format

X^{'}=
\left(
\begin{array}{c}
x^{'}_{1}\\
\vdots\\
x^{'}_{N}
\end{array}
\right)
,
Y^{'}=
\left(
\begin{array}{c}
y^{'}_{1}\\
\vdots\\
y^{'}_{N}
\end{array}
\right)
,
P=
\left(
\begin{array}{c}
p_{1}\\
\vdots\\
p_{N}
\end{array}
\right)
=
\left(
\begin{array}{c}
x_{1} & y_{2} & 1\\
&\vdots&\\
x_{N} & y_{N} & 1
\end{array}
\right)

given that

E=
(X^{'} - Pw_1)^{T}(X^{'} - Pw_1) + (Y^{'} - Pw_2)^T(Y^{'} - Pw_2)

Can be written. When unfolded

\begin{align}
E&=({X^{'}}^{T} - (Pw_1)^{T})(X^{'} - Pw_1) + ({Y^{'}}^{T} - (Pw_2)^{T})(Y^{'}-Pw_2)\\
&={X^{'}}^{T}X^{'} - {X^{'}}^{T}Pw_1 - (Pw_1)^{T}X^{'} + (Pw_1)^{T}(Pw_1) + {Y^{'}}^{T}Y^{'} - {Y^{'}}^{T}Pw_2 - (Pw_2)^{T}Y^{'} + (Pw_2)^{T}(Pw_2)\\
&={X^{'}}^{T}X^{'} - w^{T}_{1}P^{T}{X^{'}}^{T} - w^{T}_{1}P^{T}{X^{'}}^{T} + w^{T}_{1}P^{T}Pw_{1} + {Y^{'}}^{T}Y^{'} - w^{T}_{2}P^{T}{Y^{'}}^{T} - w^{T}_{2}P^{T}{Y^{'}}^{T} + w^{T}_{2}P^{T}Pw_{2}\\
&={X^{'}}^{T}X^{'} - 2w^{T}_{1}P^{T}{X^{'}}^{T} + w^{T}_{1}P^{T}Pw_{1} + {Y^{'}}^{T}Y^{'} - 2w^{T}_{2}P^{T}{Y^{'}}^{T} + w^{T}_{2}P^{T}Pw_{2}\\
\end{align}

It will be. If you find the time when $ E $ becomes smaller by partial differentiation with $ w_1 $ and $ w_2 $

\begin{align}
\frac{\partial E}{\partial w_{1}}=-2P^{T}X^{'} + 2P^{T}Pw_{1}&=0\\
2P^{T}w_{1}&=2P^{T}X^{'}\\
w_{1}&=(P^{T}P)^{-1}P^{T}X^{'}\\
\frac{\partial E}{\partial w_{2}}=-2P^{T}Y^{'} + 2P^{T}Pw_{2}&=0\\
2P^{T}w_{2}&=2P^{T}Y^{'}\\
w_{2}&=(P^{T}P)^{-1}P^{T}Y^{'}
\end{align}

As a result, $ w_1 $ and $ w_2 $ were obtained, so the affine matrix was obtained.

Implementation

Let's implement it in Python using only numpy.

import numpy as np
import math
from PIL import Image
from matplotlib import pyplot as plt


#A function that refers to the end of the array for those who exceed the range of the reference image
def clip_xy(ref_xy, img_shape):
    #Replace for x coordinate
    ref_x = np.where((0 <= ref_xy[:, 0]) & (ref_xy[:, 0] < img_shape[1]), ref_xy[:, 0], -1)
    #Replace for y coordinate
    ref_y = np.where((0 <= ref_xy[:, 1]) & (ref_xy[:, 1] < img_shape[0]), ref_xy[:, 1], -1)

    #Combine and return
    return np.vstack([ref_x, ref_y]).T


#Affine transformation
def affine(data, affine, draw_area_size):
    # data:Image data to be converted to affine
    # affine:Affine matrix
    #:draw_area_size:It may be the same as or better than the shape of data

    #Inverse matrix of affine matrix
    inv_affine = np.linalg.inv(affine)

    x = np.arange(0, draw_area_size[1], 1)
    y = np.arange(0, draw_area_size[0], 1)
    X, Y = np.meshgrid(x, y)

    XY = np.dstack([X, Y, np.ones_like(X)])
    xy = XY.reshape(-1, 3).T

    #Calculation of reference coordinates
    ref_xy = inv_affine @ xy
    ref_xy = ref_xy.T

    #Coordinates around the reference coordinates
    liner_xy = {}
    liner_xy['downleft'] = ref_xy[:, :2].astype(int)
    liner_xy['upleft'] = liner_xy['downleft'] + [1, 0]
    liner_xy['downright'] = liner_xy['downleft'] + [0, 1]
    liner_xy['upright'] = liner_xy['downleft'] + [1, 1]

    #Weight calculation with linear interpolation
    liner_diff = ref_xy[:, :2] - liner_xy['downleft']

    liner_weight = {}
    liner_weight['downleft'] = (1 - liner_diff[:, 0]) * (1 - liner_diff[:, 1])
    liner_weight['upleft'] = (1 - liner_diff[:, 0]) * liner_diff[:, 1]
    liner_weight['downright'] = liner_diff[:, 0] * (1 - liner_diff[:, 1])
    liner_weight['upright'] = liner_diff[:, 0] * liner_diff[:, 1]

    #Weight and add
    liner_with_weight = {}
    for direction in liner_weight.keys():
        l_xy = liner_xy[direction]
        l_xy = clip_xy(l_xy, data.shape)
        l_xy = np.dstack([l_xy[:, 0].reshape(draw_area_size), l_xy[:, 1].reshape(draw_area_size)])
        l_weight = liner_weight[direction].reshape(draw_area_size)
        liner_with_weight[direction] = data[l_xy[:, :, 1], l_xy[:, :, 0]] * l_weight

    data_linear = sum(liner_with_weight.values())
    return data_linear


#Function to obtain affine matrix from feature points
def registration(P, x_dash, y_dash):
    w1 = np.linalg.inv(P.T @ P) @ P.T @ x_dash
    w2 = np.linalg.inv(P.T @ P) @ P.T @ y_dash
    affine_matrix = np.array([[1.0, 0.0, 0.0],
                              [0.0, 1.0, 0.0],
                              [0.0, 0.0, 1.0]])
    affine_matrix[0, :] = w1
    affine_matrix[1, :] = w2
    print(affine_matrix)
    return affine_matrix


#Clicked feature point Save array
future_points1 = np.array([[1, 1]])
future_points2 = np.array([[1, 1]])
count_fp1 = 0
count_fp2 = 0


#Click to determine feature points
def onclick(event):
    global future_points1
    global future_points2
    global count_fp1
    global count_fp2

    click_axes = event.inaxes
    x = math.floor(event.xdata)
    y = math.floor(event.ydata)
    if click_axes == ax1:
        if count_fp1 == 0:
            future_points1[0, :] = (x, y)
            count_fp1 = 1
        else:
            future_points1 = np.vstack([future_points1, np.array([x, y])])
            count_fp1 += count_fp1
        print(future_points1)
    if click_axes == ax2:
        if count_fp2 == 0:
            future_points2[0, :] = (x, y)
            count_fp2 = 1
        else:
            future_points2 = np.vstack([future_points2, np.array([x, y])])
            count_fp2 += count_fp2
        print(future_points2)
    click_axes.scatter(x, y)
    fig.canvas.draw_idle()


#Enter male and image overlay
def onEnter(event):
    if event.key == 'enter' and future_points1.size == future_points2.size and future_points1.size >= 3:
        # P:Coordinate matrix of conversion source([[x,y,1],[x,y,1],...]
        # x_dash:Destination x coordinate vector
        # y_dash:Destination y coordinate vector
        vec_one = np.ones((future_points2.shape[0], 1))
        P = np.hstack([future_points2, vec_one])
        x_dash = future_points1[:, 0]
        y_dash = future_points1[:, 1]
        affine_matrix = registration(P, x_dash, y_dash)
        #Obtain the image after affine transformation
        affined_image = affine(image2, affine_matrix, image1.shape)
        x = np.arange(0, affined_image.shape[1], 1)
        y = np.arange(0, affined_image.shape[0], 1)
        X_affined, Y_affined = np.meshgrid(x, y)
        ax3.pcolormesh(X_affined, Y_affined, affined_image, cmap='gray', shading='auto', alpha=0.2)
        fig.canvas.draw_idle()


#Image loading
image1 = np.array(Image.open('./source/test1.jpg').convert('L'))
image2 = np.array(Image.open('./source/t_test1.jpg').convert('L'))
#Bg at the end of the image_color addition of color
bg_color = 256
image2 = np.hstack([image2, bg_color * np.ones((image2.shape[0], 1), int)])
image2 = np.vstack([image2, bg_color * np.ones((1, image2.shape[1]), int)])

x_image1 = np.arange(0, image1.shape[1], 1)
y_image1 = np.arange(0, image1.shape[0], 1)

X1, Y1 = np.meshgrid(x_image1, y_image1)

x_image2 = np.arange(0, image2.shape[1], 1)
y_image2 = np.arange(0, image2.shape[0], 1)

X2, Y2 = np.meshgrid(x_image2, y_image2)

fig = plt.figure(figsize=(8, 8))
ax1 = fig.add_subplot(221)
mesh1 = ax1.pcolormesh(X1, Y1, image1, shading='auto', cmap='gray')
ax1.invert_yaxis()
ax2 = fig.add_subplot(223)
mesh2 = ax2.pcolormesh(X2, Y2, image2, shading='auto', cmap='gray')
ax2.invert_yaxis()
ax3 = fig.add_subplot(222)
mesh3 = ax3.pcolormesh(X1, Y1, image1, shading='auto', cmap='gray', alpha=0.2)
ax3.invert_yaxis()

cid = fig.canvas.mpl_connect('button_press_event', onclick)
cid = fig.canvas.mpl_connect('key_press_event', onEnter)
plt.show()

At the end

Looking at the formula, it is similar to the formula for finding the coefficient of linear regression! !! Or rather, there is almost no difference in doing linear regression. .. .. It would be interesting if we could create feature vectors well like linear regression, or deal with image distortion by doing well with Gaussian process methods.

Please point out any mistakes or confusing points.