motivation

Image processing is a technology that can be used in various places. Even if it's a wedding entertainment. When I'm in my late twenties, my surroundings are married, so I'm conscious of it (sweat)

[Wedding / Wedding Table Set]( https://www.pakutaso.com/20130406108post-2650.html" Wedding / wedding table set ")

And weddings are sideshows

I want to improve quality even when I don't have time ... It will meet such needs. That's right for image processing.

The scope of this article covers "mapping between images" (corresponding to Practical Computer Vision Chapter 3). If you are an advanced image processor, please point out any mistakes.

Processing required for mapping between images

When preparing a photo for entertainment, it is necessary to compose the photo.

If you simply combine the photos, the photo will look strange.

The processing required there is as follows.

・ Image deformation ・ Embed the image well in the image ·Alignment ・ Alignment for panorama

Image processing technology required for each process

The elements required for image conversion processing are roughly divided as follows.

EECS 442 – Computer vision Conversation hour

・ Image scaling -Image scaling (asymmetric) ·rotation ・ Please see the figure on the far right as it cannot be explained in words.

The image is converted by combining these processes.

Here are some formulas that are important here.


\begin{pmatrix}
x'  \\
y'  \\
w'
\end{pmatrix}

=

\begin{pmatrix}
h_{11} & h_{21} & h_{31}\\
h_{12} & h_{22} & h_{32}\\
h_{13} & h_{23} & h_{33}
\end{pmatrix}


\begin{pmatrix}
x  \\
y  \\
w
\end{pmatrix}

\begin{matrix}
\boldsymbol{x'} = \boldsymbol{H}\cdot{\boldsymbol{x}}
\end{matrix}

The above equation shows that when a point is given as a vector, it is mapped to another space by multiplying it by a matrix that transforms that point. It doesn't make sense in words, so I'll put a figure below.

Lecture 16: Planar Homographies

It shows how it looks on the Pinhole camera when looking into an object from the Pinhole camera, and it shows that it is converted by multiplying the matrix H when expressing it as a plane image from the Pinhole camera.

Matrix H covers the basic image conversion processing described at the beginning. By finding the exhaustive matrix H, you can represent what it looks like on a Pinhole camera when you look into an object from a Pinhole camera, or you can represent an image on a plane from a Pinhole camera. This matrix H is called the homography matrix.

I will summarize it here.

Image conversion processing is required to create a composite photo. The image conversion process can be performed by using a homography matrix. By using the homography matrix, it is possible to reproduce an image that is diagonally moved from a plane or an image that is diagonally moved from a plane.

Homography matrix

Here we describe how to find the homography matrix.

Uses the Direct Linear Transfer algorithm.

Now let's check the formula again.


\begin{pmatrix}
x'  \\
y'  \\
w'
\end{pmatrix}

=

\begin{pmatrix}
h_{11} & h_{21} & h_{31}\\
h_{12} & h_{22} & h_{32}\\
h_{13} & h_{23} & h_{33}
\end{pmatrix}


\begin{pmatrix}
x  \\
y  \\
w
\end{pmatrix}

Since this is an image, it represents two-dimensional coordinates, but it uses coordinates called a simultaneous coordinate system, and there are w values in addition to x and y values.

The value of w is usually normalized to 1, but the effect of having this value is that it can also represent translation.

In image conversion, not only the above four but also parallel movements in which the positions change can be expressed. This is an important point.

See below for an easy-to-understand illustration.

http://d.hatena.ne.jp/Zellij/20120523/p1

Next, let's check the normalized pattern. What we want to find from this formula is a homography matrix that turns a flat image into a diagonal image and a diagonal image into a flat image.


\begin{pmatrix}
x'  \\
y'  \\
1'
\end{pmatrix}

=

\begin{pmatrix}
h_{11} & h_{21} & h_{31}\\
h_{12} & h_{22} & h_{32}\\
h_{13} & h_{23} & h_{33}
\end{pmatrix}


\begin{pmatrix}
x  \\
y  \\
1
\end{pmatrix}

When the relationship between the points of the original image and the mapped image is expressed by a mathematical formula

  x' = \frac{h_{11}x + h_{21}y + h_{31}}{h_{13}x + h_{23}y + h_{33}}\\

  y' = \frac{h_{12}x + h_{22}y + h_{32}}{h_{13}x + h_{23}y + h_{33}}

To find the matrix H, we need to find a 9-dimensional value, but here we add a constraint. Constraint the value of h33 to 1. Why is it okay to put restrictions here?

  x' = \frac{kh_{11}x + kh_{21}y + kh_{31}}{kh_{13}x + kh_{23}y + kh_{33}}\\

  y' = \frac{kh_{12}x + kh_{22}y + kh_{32}}{kh_{13}x + kh_{23}y + kh_{33}}\\
\\

When h33 is set to 1

  x' = \frac{h_{11}x + h_{21}y + h_{31}}{h_{13}x + h_{23}y + 1}\\

  y' = \frac{h_{12}x + h_{22}y + h_{32}}{h_{13}x + h_{23}y + 1}

Even if you calculate the value multiplied by k, it is practically meaningless because it is divided by the denominator and numerator, so h33 is used in both x and y. Here, it is safe to assume that k is a value that sets h33 to 1. Therefore, we were able to change from the problem of finding the 9-dimensional value to the problem of finding the 8-dimensional value by limiting the value of h33 to 1.

Direct solution of linear transformation method from here


  (h_{13}x + h_{23}y + 1)x' = h_{11}x + h_{21}y + h_{31}\\

  ({h_{13}x + h_{23}y + 1})y' = h_{12}x + h_{22}y + h_{32}

Asked x', y'


 　x' = h_{11}x + h_{21}y + h_{31} - h_{13}xx' - h_{23}yx' 　\\

  y' = h_{12}x + h_{22}y + h_{32} - h_{13}xy' - h_{23}yy'

It turned out that two mathematical formulas can be obtained from the mapped one-point data. In other words, mathematically, it was found that if four points are obtained, the simultaneous equations required to obtain eight values can be obtained. Here, I will express the formula when 4 points are obtained.


\begin{pmatrix}
x'_{1}  \\
y'_{1}  \\
x'_{2}  \\
y'_{2}  \\
x'_{3}  \\
y'_{3}  \\
x'_{4}  \\
y'_{4}  
\end{pmatrix}

=

\begin{pmatrix}
x_{1} & y_{1} & 1 & 0 & 0 & 0 & -x_{1}x'_{1} & -y_{1}x'_{1}\\
0 & 0 & 0 & x_{1} & y_{1} & 1 & -x_{1}y'_{1} & -y_{1}y'_{1}\\
x_{2} & y_{2} & 1 & 0 & 0 & 0 & -x_{2}x'_{2} & -y_{2}x'_{2}\\
0 & 0 & 0 & x_{2} & y_{2} & 1 & -x_{2}y'_{2} & -y_{2}y'_{2}\\
x_{3} & y_{3} & 1 & 0 & 0 & 0 & -x_{3}x'_{3} & -y_{3}x'_{3}\\
0 & 0 & 0 & x_{3} & y_{3} & 1 & -x_{3}y'_{3} & -y_{3}y'_{3}\\
x_{4} & y_{4} & 1 & 0 & 0 & 0 & -x_{4}x'_{4} & -y_{4}x'_{4}\\
0 & 0 & 0 & x_{4} & y_{4} & 1 & -x_{4}y'_{4} & -y_{4}y'_{4}\\
\end{pmatrix}


\begin{pmatrix}
h_{11}  \\
h_{12}  \\
h_{13}  \\
h_{21}  \\
h_{22}  \\
h_{23}  \\
h_{31}  \\
h_{32}
 \end{pmatrix}

It is also possible to find the above simultaneous equations analytically. Normally, it is obvious that the reproducibility will be higher if you reproduce by taking 4 points or more.

It is a solution when the number of points increases, but it is a solution by the least squares method using SVD. The last row of the matrix V of the solution obtained by SVD is the solution by the least squares method this time. (The point cloud uses the values observed from the space normalized by the mean 0 and the variance 1.)

If you would like to see why this is the case, please see below.

Singular value decomposition

To summarize here

・ The homography matrix is a matrix that moves an image diagonally from a plane and from an oblique to a plane. ・ Matrix can be calculated by taking 4 points ・ Use SVD when calculating with 4 points or more.

Affine transformation

From here, we will introduce the affine transformation that can take advantage of the three points required instead of limiting the problems that can be solved by simplifying the homography matrix.


\begin{pmatrix}
x'  \\
y'  \\
1
\end{pmatrix}

=

\begin{pmatrix}
a_{1} & a_{2} & t_{1}\\
a_{3} & a_{4} & t_{2}\\
0 & 0 & 1
\end{pmatrix}


\begin{pmatrix}
x  \\
y  \\
1
\end{pmatrix}

What the above formula means is scaling 1. In other words, the amount of conversion is suppressed.

You can convert from a quadrangle to a parallelogram, enlarge, reduce, and translate, but you cannot convert from a quadrangle to a trapezoid.

  x' = \frac{h_{11}x + h_{21}y + h_{31}}{h_{13}x + h_{23}y + 1}\\

  y' = \frac{h_{12}x + h_{22}y + h_{32}}{h_{13}x + h_{23}y + 1}

If you check the above formula, you can convert the values of x'and y'in common according to the values of x and y. In other words, it was possible to change the size of the entire value depending on the position of the coordinates, but since it changed as follows, it can only handle simple conversion and translation. Instead, you only need 3 points. The advantage of this is that split affine warping can be used.

  x' = h_{11}x + h_{21}y + h_{31}\\

  y' = h_{12}x + h_{22}y + h_{32}

Embed an image in another image

Use split affine warping to embed an image in another image.

When you want to match two images as shown below, you usually take a point cloud and sample the points in the homography matrix to calculate, but it is difficult to match the points of the image due to calculation error etc. Become.

However, the affine transformation using the Delaunay triangulation method makes it possible to match the vertices of the image.

Then, I will explain why it is possible to match the vertices of an image by using the affine transformation using the Delaunay triangulation method.

Only 3 points are important in affine transformation.

And the Delaunay triangle division method is a method of connecting a certain point cloud with a triangle that has the maximum minimum angle when a certain point cloud is obtained. See below for how to choose the maximum angle of the triangle.

Delaunay Triangle Division

An example is shown below.

Programming Computer Vision with Python

In other words, since affine transformation is applied to all points, it is possible to perform transformation that matches the vertices. The calculation method is the DLT method, and the affine transformation is calculated in the same way.

Summary

・ Affine transformation has fewer variations than homography transformation ・ Instead, it can be converted with only 3 points ・ If 3 points are sufficient, the Delaunay triangle division method can be used, so conversion with vertices matched is possible.

Image alignment

In the case of image alignment, similarity transformation is used because the image itself uses a similar image.


\begin{pmatrix}
x'  \\
y'  \\
1
\end{pmatrix}

=

\begin{pmatrix}
s\ cos(\theta) & -s\ sin(\theta) & t_{x}\\
s\ sin(\theta) & s\ cos(\theta) & t_{y}\\
0 & 0 & 1
\end{pmatrix}


\begin{pmatrix}
x  \\
y  \\
1
\end{pmatrix}

s is the magnification and θ is the turnover. Since it can be scaled and rotated over the entire coordinates, it is the best conversion for alignment. Let's compare the average of the images below with the simple average. It can be confirmed that the image can be reproduced more accurately by aligning.

Programming Computer Vision with Python

Programming Computer Vision with Python

RANSAC

Only the DLT method is introduced, but since the DLT method is not robust to noise, another method has been proposed. This technique is a time-consuming algorithm instead of being robust against noise.

Please refer to the following materials for specific methods.

Preemptive RANSAC by David Nister.

Image stitching

There is an image taken from the same place. If you want to join the images together as shown in the second figure

Programming Computer Vision with Python

Programming Computer Vision with Python

1: Acquire the feature points of the image by SHIFT (see below for details) Gold needle for when it becomes a stone by looking at the formula of image processing 2: Calculate the homography matrix between images using RANSAC 3: Select the center image 4: Add 0 to the left and right of the center image to join the deformed images. 5: It is determined whether to connect to the left or right by the parallel component.

Since the difference between images is noticeable, commercially available software uses a process to normalize and smooth the brightness.

ipython notebook

Only affine transformation, but it is written in code.

https://github.com/SnowMasaya/Image_Processing_for_Wedding

reference

Practical computer vision Draft of this article

Pennsylvania State University Computer Science It is easy to understand because the lecture about image conversion from 3D is described and it is not limited to 2D as in this article.

University of Michigan

University of Michigan Summary

Since the material is from the basics of image conversion, the basics can be suppressed.

Preemptive RANSAC by David Nister.

An excellent slide to quickly understand the RANSAC algorithm

Re-learning after becoming an adult: Affine transformation

Easy-to-understand explanation of affine transformation