I tried to find the affine matrix in image alignment (feature point matching) using affine transformation.

Introduction

This is an article to find the affine matrix in feature point matching. In this article, we will decide the feature points manually. image1.png The purpose is to find the affine matrix A that converts image1 to image2 with the feature points of the two images known like this.

affine.gif If you can find the affine matrix from the feature points, you can overlay the images like this.

Feature point matching

Affine matrix

Matrix calculation to transform each coordinate of the image

\left(
\begin{array}{c}
x^{'}\\
y^{'}\\
1
\end{array}
\right)
=
\left(
\begin{array}{ccc}
a & b & t_{x}\\
c & d & t_{y}\\
0 & 0 & 1
\end{array}
\right)
\left(
\begin{array}{c}
x\\
y\\
1
\end{array}
\right)

of

A=
\left(
\begin{array}{ccc}
a & b & t_{x}\\
c & d & t_{y}\\
0 & 0 & 1
\end{array}
\right)

The part of is an affine matrix. An excellent one that can represent rotation, enlargement, reduction, movement, inversion, and shear of an image with this matrix alone! !!

For the affine transformation, I referred to the following article. Completely understand affine transformation Affine transformation by matrix (scaling, rotation, shearing, movement) -Reinventor of Python image processing-

Calculation of affine matrix

When there are $ N (N \ geqq3) $ feature points in two images, the coordinates of the feature points in the image before conversion are calculated.

\left(
\begin{array}{c}
x_{n}\\
y_{n}
\end{array}
\right)

Coordinates after return

\left(
\begin{array}{c}
x^{'}_{n}\\
y^{'}_{n}
\end{array}
\right)

The determinant that transforms all $ N $ coordinates as affine

\left(
\begin{array}{c}
x^{'}_{1}&x^{'}_{2}&\cdots&x^{'}_{N}\\
y^{'}_{1}&x^{'}_{2}&\cdots&x^{'}_{N}\\
1&1&\cdots&1
\end{array}
\right)
=
\left(
\begin{array}{ccc}
a & b & t_{x}\\
c & d & t_{y}\\
0 & 0 & 1
\end{array}
\right)
\left(
\begin{array}{c}
x_{1}&x_{2}&\cdots&x_{N}\\
y_{1}&x_{2}&\cdots&x_{N}\\
1&1&\cdots&1
\end{array}
\right)

It can be represented by. The purpose is to find this $ a, b, c, d, t_ {x}, t_ {y} $. Here, a set of coordinates before and after conversion

\left(
\begin{array}{c}
x_{n}\\
y_{n}
\end{array}
\right)
,
\left(
\begin{array}{c}
x^{'}_{n}\\
y^{'}_{n}
\end{array}
\right)

Affine transformation against

\left(
\begin{array}{c}
x^{'}_{n}\\
y^{'}_{n}\\
1
\end{array}
\right)
=
\left(
\begin{array}{ccc}
a & b & t_{x}\\
c & d & t_{y}\\
0 & 0 & 1
\end{array}
\right)
\left(
\begin{array}{c}
x_{n}\\
y_{n}\\
1
\end{array}
\right)

When you expand

\begin{align}
x^{'}_{n}&=ax_{n} + by_{n} + t_{x}\\
y^{'}_{n}&=cx_{n} + dy_{n} + t_{y}
\end{align}

Will be.

w_1=
\left(
\begin{array}{c}
a\\
b\\
t_{x}
\end{array}
\right)
,
w_2=
\left(
\begin{array}{c}
c\\
d\\
t_{y}
\end{array}
\right)
,
p_{n}=
\left(
\begin{array}{c}
x_{n} & y_{n} & 1
\end{array}
\right)
,
p^{'}_{n}=
\left(
\begin{array}{c}
x^{'}_{n} & y^{'}_{n} & 1
\end{array}
\right)

If you prepare a vector like

\begin{align}
x^{'}_{n}&=p_{n}w_1\\
y^{'}_{n}&=p_{n}w_2
\end{align}

Can be written. The distance between the coordinates of the conversion destination and the coordinates after the return by the affine transformation is used as an error function, and $ w_1 $ and $ w_2 $ when the error function is the smallest are obtained. Error function $ E $

E=\sum_{n=1}^{N} 
\Bigl(
(x^{'}_{n} - p_{n}w_1)^{2} + (y^{'}_{n} - p_{n}w_2)^{2}
\Bigr)

To set this to represent this in matrix format

X^{'}=
\left(
\begin{array}{c}
x^{'}_{1}\\
\vdots\\
x^{'}_{N}
\end{array}
\right)
,
Y^{'}=
\left(
\begin{array}{c}
y^{'}_{1}\\
\vdots\\
y^{'}_{N}
\end{array}
\right)
,
P=
\left(
\begin{array}{c}
p_{1}\\
\vdots\\
p_{N}
\end{array}
\right)
=
\left(
\begin{array}{c}
x_{1} & y_{2} & 1\\
&\vdots&\\
x_{N} & y_{N} & 1
\end{array}
\right)

given that

E=
(X^{'} - Pw_1)^{T}(X^{'} - Pw_1) + (Y^{'} - Pw_2)^T(Y^{'} - Pw_2) 

Can be written. When unfolded

\begin{align}
E&=({X^{'}}^{T} - (Pw_1)^{T})(X^{'} - Pw_1) + ({Y^{'}}^{T} - (Pw_2)^{T})(Y^{'}-Pw_2)\\
&={X^{'}}^{T}X^{'} - {X^{'}}^{T}Pw_1 - (Pw_1)^{T}X^{'} + (Pw_1)^{T}(Pw_1) + {Y^{'}}^{T}Y^{'} - {Y^{'}}^{T}Pw_2 - (Pw_2)^{T}Y^{'} + (Pw_2)^{T}(Pw_2)\\
&={X^{'}}^{T}X^{'} - w^{T}_{1}P^{T}{X^{'}}^{T} - w^{T}_{1}P^{T}{X^{'}}^{T} + w^{T}_{1}P^{T}Pw_{1} + {Y^{'}}^{T}Y^{'} - w^{T}_{2}P^{T}{Y^{'}}^{T} - w^{T}_{2}P^{T}{Y^{'}}^{T} + w^{T}_{2}P^{T}Pw_{2}\\
&={X^{'}}^{T}X^{'} - 2w^{T}_{1}P^{T}{X^{'}}^{T} + w^{T}_{1}P^{T}Pw_{1} + {Y^{'}}^{T}Y^{'} - 2w^{T}_{2}P^{T}{Y^{'}}^{T} + w^{T}_{2}P^{T}Pw_{2}\\
\end{align}

It will be. If you find the time when $ E $ becomes smaller by partial differentiation with $ w_1 $ and $ w_2 $

\begin{align}
\frac{\partial E}{\partial w_{1}}=-2P^{T}X^{'} + 2P^{T}Pw_{1}&=0\\
2P^{T}w_{1}&=2P^{T}X^{'}\\
w_{1}&=(P^{T}P)^{-1}P^{T}X^{'}\\
\frac{\partial E}{\partial w_{2}}=-2P^{T}Y^{'} + 2P^{T}Pw_{2}&=0\\
2P^{T}w_{2}&=2P^{T}Y^{'}\\
w_{2}&=(P^{T}P)^{-1}P^{T}Y^{'}
\end{align}

As a result, $ w_1 $ and $ w_2 $ were obtained, so the affine matrix was obtained.

Implementation

Let's implement it in Python using only numpy.

import numpy as np
import math
from PIL import Image
from matplotlib import pyplot as plt


#A function that refers to the end of the array for those who exceed the range of the reference image
def clip_xy(ref_xy, img_shape):
    #Replace for x coordinate
    ref_x = np.where((0 <= ref_xy[:, 0]) & (ref_xy[:, 0] < img_shape[1]), ref_xy[:, 0], -1)
    #Replace for y coordinate
    ref_y = np.where((0 <= ref_xy[:, 1]) & (ref_xy[:, 1] < img_shape[0]), ref_xy[:, 1], -1)

    #Combine and return
    return np.vstack([ref_x, ref_y]).T


#Affine transformation
def affine(data, affine, draw_area_size):
    # data:Image data to be converted to affine
    # affine:Affine matrix
    #:draw_area_size:It may be the same as or better than the shape of data

    #Inverse matrix of affine matrix
    inv_affine = np.linalg.inv(affine)

    x = np.arange(0, draw_area_size[1], 1)
    y = np.arange(0, draw_area_size[0], 1)
    X, Y = np.meshgrid(x, y)

    XY = np.dstack([X, Y, np.ones_like(X)])
    xy = XY.reshape(-1, 3).T

    #Calculation of reference coordinates
    ref_xy = inv_affine @ xy
    ref_xy = ref_xy.T

    #Coordinates around the reference coordinates
    liner_xy = {}
    liner_xy['downleft'] = ref_xy[:, :2].astype(int)
    liner_xy['upleft'] = liner_xy['downleft'] + [1, 0]
    liner_xy['downright'] = liner_xy['downleft'] + [0, 1]
    liner_xy['upright'] = liner_xy['downleft'] + [1, 1]

    #Weight calculation with linear interpolation
    liner_diff = ref_xy[:, :2] - liner_xy['downleft']

    liner_weight = {}
    liner_weight['downleft'] = (1 - liner_diff[:, 0]) * (1 - liner_diff[:, 1])
    liner_weight['upleft'] = (1 - liner_diff[:, 0]) * liner_diff[:, 1]
    liner_weight['downright'] = liner_diff[:, 0] * (1 - liner_diff[:, 1])
    liner_weight['upright'] = liner_diff[:, 0] * liner_diff[:, 1]

    #Weight and add
    liner_with_weight = {}
    for direction in liner_weight.keys():
        l_xy = liner_xy[direction]
        l_xy = clip_xy(l_xy, data.shape)
        l_xy = np.dstack([l_xy[:, 0].reshape(draw_area_size), l_xy[:, 1].reshape(draw_area_size)])
        l_weight = liner_weight[direction].reshape(draw_area_size)
        liner_with_weight[direction] = data[l_xy[:, :, 1], l_xy[:, :, 0]] * l_weight

    data_linear = sum(liner_with_weight.values())
    return data_linear


#Function to obtain affine matrix from feature points
def registration(P, x_dash, y_dash):
    w1 = np.linalg.inv(P.T @ P) @ P.T @ x_dash
    w2 = np.linalg.inv(P.T @ P) @ P.T @ y_dash
    affine_matrix = np.array([[1.0, 0.0, 0.0],
                              [0.0, 1.0, 0.0],
                              [0.0, 0.0, 1.0]])
    affine_matrix[0, :] = w1
    affine_matrix[1, :] = w2
    print(affine_matrix)
    return affine_matrix


#Clicked feature point Save array
future_points1 = np.array([[1, 1]])
future_points2 = np.array([[1, 1]])
count_fp1 = 0
count_fp2 = 0


#Click to determine feature points
def onclick(event):
    global future_points1
    global future_points2
    global count_fp1
    global count_fp2

    click_axes = event.inaxes
    x = math.floor(event.xdata)
    y = math.floor(event.ydata)
    if click_axes == ax1:
        if count_fp1 == 0:
            future_points1[0, :] = (x, y)
            count_fp1 = 1
        else:
            future_points1 = np.vstack([future_points1, np.array([x, y])])
            count_fp1 += count_fp1
        print(future_points1)
    if click_axes == ax2:
        if count_fp2 == 0:
            future_points2[0, :] = (x, y)
            count_fp2 = 1
        else:
            future_points2 = np.vstack([future_points2, np.array([x, y])])
            count_fp2 += count_fp2
        print(future_points2)
    click_axes.scatter(x, y)
    fig.canvas.draw_idle()


#Enter male and image overlay
def onEnter(event):
    if event.key == 'enter' and future_points1.size == future_points2.size and future_points1.size >= 3:
        # P:Coordinate matrix of conversion source([[x,y,1],[x,y,1],...]
        # x_dash:Destination x coordinate vector
        # y_dash:Destination y coordinate vector
        vec_one = np.ones((future_points2.shape[0], 1))
        P = np.hstack([future_points2, vec_one])
        x_dash = future_points1[:, 0]
        y_dash = future_points1[:, 1]
        affine_matrix = registration(P, x_dash, y_dash)
        #Obtain the image after affine transformation
        affined_image = affine(image2, affine_matrix, image1.shape)
        x = np.arange(0, affined_image.shape[1], 1)
        y = np.arange(0, affined_image.shape[0], 1)
        X_affined, Y_affined = np.meshgrid(x, y)
        ax3.pcolormesh(X_affined, Y_affined, affined_image, cmap='gray', shading='auto', alpha=0.2)
        fig.canvas.draw_idle()


#Image loading
image1 = np.array(Image.open('./source/test1.jpg').convert('L'))
image2 = np.array(Image.open('./source/t_test1.jpg').convert('L'))
#Bg at the end of the image_color addition of color
bg_color = 256
image2 = np.hstack([image2, bg_color * np.ones((image2.shape[0], 1), int)])
image2 = np.vstack([image2, bg_color * np.ones((1, image2.shape[1]), int)])

x_image1 = np.arange(0, image1.shape[1], 1)
y_image1 = np.arange(0, image1.shape[0], 1)

X1, Y1 = np.meshgrid(x_image1, y_image1)

x_image2 = np.arange(0, image2.shape[1], 1)
y_image2 = np.arange(0, image2.shape[0], 1)

X2, Y2 = np.meshgrid(x_image2, y_image2)

fig = plt.figure(figsize=(8, 8))
ax1 = fig.add_subplot(221)
mesh1 = ax1.pcolormesh(X1, Y1, image1, shading='auto', cmap='gray')
ax1.invert_yaxis()
ax2 = fig.add_subplot(223)
mesh2 = ax2.pcolormesh(X2, Y2, image2, shading='auto', cmap='gray')
ax2.invert_yaxis()
ax3 = fig.add_subplot(222)
mesh3 = ax3.pcolormesh(X1, Y1, image1, shading='auto', cmap='gray', alpha=0.2)
ax3.invert_yaxis()

cid = fig.canvas.mpl_connect('button_press_event', onclick)
cid = fig.canvas.mpl_connect('key_press_event', onEnter)
plt.show()

At the end

Looking at the formula, it is similar to the formula for finding the coefficient of linear regression! !! Or rather, there is almost no difference in doing linear regression. .. .. It would be interesting if we could create feature vectors well like linear regression, or deal with image distortion by doing well with Gaussian process methods.

Please point out any mistakes or confusing points.

Recommended Posts

I tried to find the affine matrix in image alignment (feature point matching) using affine transformation.
I tried to compress the image using machine learning
I tried to extract the text in the image file using Tesseract of the OCR engine
I tried to find the entropy of the image with python
I tried to process the image in "sketch style" with OpenCV
I tried to process the image in "pencil style" with OpenCV
I tried to transform the face image using sparse_image_warp of TensorFlow Addons
I tried to sort out the objects from the image of the steak set meal-⑤ Similar image feature point detection
I tried to correct the keystone of the image
I tried using the image filter of OpenCV
Find the position in the original image from the coordinates after affine transformation (Python + OpenCV)
How to save the feature point information of an image in a file and use it for matching
I tried to graph the packages installed in Python
I tried to detect the iris from the camera image
I tried to approximate the sin function using chainer
Find card illustrations from images using feature point matching
I tried to redo the non-negative matrix factorization (NMF)
I tried to identify the language using CNN + Melspectogram
I tried to complement the knowledge graph using OpenKE
Python OpenCV tried to display the image in text.
I tried to find out the outline about Big Gorilla
I tried to find the average of the sequence with TensorFlow
I tried to simulate ad optimization using the bandit algorithm.
I tried to illustrate the time and time in C language
I tried to summarize the commands often used in business
I tried to implement the mail sending function in Python
[TF] I tried to visualize the learning result using Tensorboard
I tried to make a stopwatch using tkinter in python
I tried to approximate the sin function using chainer (re-challenge)
Implementation of recommendation system ~ I tried to find the similarity from the outline of the movie using TF-IDF ~
I tried to find the trend of the number of ships in Tokyo Bay from satellite images.
I tried to output the access log to the server using Node.js
I tried to find out the difference between A + = B and A = A + B in Python, so make a note
I tried to describe the traffic in real time with WebSocket
I tried to get the index of the list using the enumerate function
I tried to digitize the stamp stamped on paper using OpenCV
I tried to cut out a still image from the video
I tried to move the ball
I tried using the checkio API
I tried to estimate the interval.
I tried to understand the learning function in the neural network carefully without using the machine learning library (second half).
Python programming: I tried to get company information (crawling) from Yahoo Finance in the US using BeautifulSoup4
I want to change the color by clicking the scatter point in matplotlib
[Python] I tried to summarize the set type (set) in an easy-to-understand manner.
I tried to verify the best way to find a good marriage partner
I tried to execute SQL from the local environment using Looker SDK
I tried moving the image to the specified folder by right-clicking and left-clicking
I tried to estimate the similarity of the question intent using gensim's Doc2Vec
I tried to control multiple servo motors MG996R using the servo driver PCA9685.
I tried to classify guitar chords in real time using machine learning
Matching karaoke keys ~ I tried to put it on Laravel ~ <on the way>
I tried to summarize various sentences using the automatic summarization API "summpy"
I tried to find the optimal path of the dreamland by (quantum) annealing
I tried to extract and illustrate the stage of the story using COTOHA
I tried to display the altitude value of DTM in a graph
I tried the common story of using Deep Learning to predict the Nikkei 225
Using COTOHA, I tried to follow the emotional course of Run, Melos!
I implemented the VGG16 model in Keras and tried to identify CIFAR10
I tried to analyze the New Year's card by myself using python
I tried to make PyTorch model API in Azure environment using TorchServe
I tried to train the RWA (Recurrent Weighted Average) model in Keras