A story about affine transformation only by matrix operation without relying on the image processing library. Also possible with Pythonista
** Click here for basics **
Instead of relying on OpenCV or Pillow, use numpy and matplotlib to actually write various image processing. It's a combination that can also be used with the iOS app Pythonista.
import numpy as np
import matplotlib.pyplot as plt
In addition, the following functions are convenient for displaying images. (For details, Basics)
def img_show(img : np.ndarray, cmap = 'gray', vmin = 0, vmax = 255, interpolation = 'none') -> None:
'''np.Display an image with array as an argument.'''
#Set dtype to uint8
img = np.clip(img,vmin,vmax).astype(np.uint8)
#Display image
plt.imshow(img, cmap = cmap, vmin = vmin, vmax = vmax, interpolation = interpolation)
plt.show()
plt.close()
There are many ways to distort an image. A transformation that combines linear transformation (scaling, rotation, shearing) and translation is called affine transformation. Here was easy to understand. There are many explanations that are easy to understand even if you google.
This affine transformation can be expressed by matrix multiplication. When an affine transformation $ A $ moves from the point $ (x_0, y_0) $ to $ (x_1, y_1) $
\left(
\begin{matrix}
x_1\\
y_1\\
1
\end{matrix}
\right)
=A
\left(
\begin{matrix}
x_0\\
y_0\\
1
\end{matrix}
\right)
Can be written as. Looking at this $ A $ coefficient
\left(
\begin{matrix}
a &b& t_x\\
c &d&y_y\\
0&0&1
\end{matrix}
\right)
It has the shape of. Of these, $ a, b, c, d $ are in charge of transformation, and $ t_x, t_y $ are in charge of translation. If you calculate it,
\left(
\begin{matrix}
x_1\\
y_1\\
1
\end{matrix}
\right)
=\left(
\begin{matrix}
a &b& t_x\\
c &d&t_y\\
0&0&1
\end{matrix}
\right)
\left(
\begin{matrix}
x_0\\y_0\\1
\end{matrix}
\right)
=
\left(
\begin{matrix}
ax_0 +by_0+ t_x\\
cx_0 +dy_0+t_y\\
0+0+1
\end{matrix}
\right)
Now, let's apply this in image conversion. I cut out 'tiger.jpeg' and used it.
img = plt.imread('tiger.jpeg')[1390:1440,375:425]
img_show(img)
The procedure is
First, let's create a two-dimensional array containing the coordinates of each pixel. Imitate the above vector and add 1 at the end.
#Make an image with height 3 and width 4
height, width = 3,4
#Create x-coordinate matrix and y-coordinate matrix with mgrid
x, y = np.mgrid[:x_len,:y_len]
#Combine x-coordinate, y-coordinate, 1 with dstack
xy_after = np.dstack((x,y,np.ones((x_len, y_len))))
xy_after
#array([
#[[ 0., 0., 1.], [ 0., 1., 1.], [ 0., 2., 1.], [ 0., 3., 1.]],
#[[ 1., 0., 1.], [ 1., 1., 1.], [ 1., 2., 1.], [ 1., 3., 1.]],
#[[ 2., 0., 1.], [ 2., 1., 1.], [ 2., 2., 1.], [ 2., 3., 1.]]])
In image processing, the affine matrix is not used directly, but its inverse matrix is used (think of it as regular). The reason is "to determine the coordinates to refer to for each pixel".
#Affine transformation that expands vertically and horizontally twice
affin = np.matrix('2,0,0;0,2,0;0,0,1')
#Inverse matrix
inv_affin = np.linalg.inv(affin)
#Calculate matrix multiplication with Einstein sum
ref_xy = np.einsum('ijk,lk->ijl',xy_after,inv_affin)[...,:2]
ref_xy
#array([
#[[ 0. , 0. ], [ 0.5, 0. ], [ 1. , 0. ]],
#[[ 0. , 0.5], [ 0.5, 0.5], [ 1. , 0.5]],
#[[ 0. , 1. ], [ 0.5, 1. ], [ 1. , 1. ]],
#[[ 0. , 1.5], [ 0.5, 1.5], [ 1. , 1.5]]])
In this way, for example, the inverse matrix was used to know that the $ (1,1) $ after conversion matches the $ (0.5,0.5) $ before conversion.
Looking at the matrix'ref_xy'above, we can see that [2., 2.]
should match the pixel value of [1., 1.]
. However, [1., 2.]
etc. must refer to the nonexistent pixel [0.5,1.]
. What to do with these nonexistent coordinates.
I would like to introduce two methods below. Easy to see here
Simply put, it is a method of rounding. To round off, add 0.5 and convert to int type.
The code below actually magnifies the image.
#100 because the referenced coordinates are rounded off,If you set it to 450, an index error will occur.
height, width = 99, 149
x,y = np.mgrid[:height,:width]
xy_after = np.dstack((x,y,np.ones((height, width))))
#Prepare a matrix for affine transformation
#Double vertically, triple horizontally
affin = np.matrix('2,0,0;0,3,0;0,0,1')
inv_affin = np.linalg.inv(affin)
#Calculate the coordinates to refer to
ref_xy = np.einsum('ijk,lk->ijl',xy_after,inv_affin)[...,:2]
ref_nearmost_xy = (ref_xy + 0.5).astype(int)
img_nearmost = img[ref_nearmost_xy[...,0],ref_nearmost_xy[...,1]]
img_show(img_nearmost)
The photo of the previous link is still easy to understand.
In this method, four close pixels are weighted by their closeness.
First, calculate the close pixels.
#After calculating the upper left with int, move it to calculate
linear_xy = {}
linear_xy['upleft'] = ref_xy.astype(int)
linear_xy['downleft'] = linear_xy['upleft'] + [1,0]
linear_xy['upright']= linear_xy['upleft'] + [0,1]
linear_xy['downright'] = linear_xy['upleft'] + [1,1]
Next, the weighting is calculated by calculating the difference from the upper left pixel.
#Calculate the difference from the upper left point
upleft_diff = ref_xy - linear_xy['upleft']
#(1-x difference)When(1-difference in y)Calculate the product of
linear_weight = {}
linear_weight['upleft'] = (1-upleft_diff[...,0])*(1-upleft_diff[...,1])
linear_weight['downleft'] = upleft_diff[...,0]*(1-upleft_diff[...,1])
linear_weight['upright'] = (1-upleft_diff[...,0])*upleft_diff[...,1]
linear_weight['downright'] = upleft_diff[...,0]*upleft_diff[...,1]
All that is left is to multiply this and calculate the pixel value.
#height, width = 98, 147
#affin = np.matrix('2,0,0;0,3,0;0,0,1')
#To
linear_with_weight = {}
for direction in liner_xy.keys():
xy = linear_xy[direction]
weight = linear_weight[direction]
linear_with_weight[direction] = np.einsum('ij,ijk->ijk',weight,img[xy[...,0],xy[...,1]])
img_linear = sum(linear_with_weight.values())
img_show(img_linear)
There are subtle differences, and this one is smoother.
An index error may occur depending on the method of image transformation and the shape after transformation. The reason is that it refers to a pixel that does not exist. For the time being, define a function to replace with -1 whose coordinates are below 0 or above the maximum value.
def clip_xy(ref_xy, img_shape):
#Replace for x coordinate
ref_x = np.where((0<=ref_xy[...,0])&(ref_xy[...,0]<img_shape[0]),ref_xy[...,0],-1)
#Replace for y coordinate
ref_y = np.where((0<=ref_xy[...,1])&(ref_xy[...,1]<img_shape[1]),ref_xy[...,1],-1)
#Combine and return
return np.dstack([ref_x,ref_y])
Then, in fact, by replacing it with -1, all the pixels that used to refer to the out-of-order pixels now refer to the last row and last column. (There is no problem with ʻimg_shape [0]` instead of -1) All you have to do is create the last row and last column with the background color.
#Set background color
bg_color = [0,0,0]
#Create a larger image filled with background color
img_bg = np.empty(np.add(img.shape,(1,1,0)))
img_bg[:,:] = bg_color
#Paste the image
img_bg[:-1,:-1] = img
#Create a converted image with a height of 150 and a width of 500
height, width = 150, 500
x,y = np.mgrid[:height,:width]
xy_after = np.dstack((x,y,np.ones((height, width))))
#Prepare a matrix for affine transformation
#Double vertically, triple horizontally
affin = np.matrix('2,0,0;0,3,0;0,0,1')
inv_affin = np.linalg.inv(affin)
#Convert image by nearest neighbor method
ref_xy = np.einsum('ijk,lk->ijl',xy_after,inv_affin)[...,:2]
ref_nearmost_xy = (ref_xy + 0.5).astype(int)
ref_nearmost_xy = clip_xy(ref_nearmost_xy)
#clip_Changed the pixel referenced by xy to refer to the last row and last column
img_nearmost_bg = img_bg[ref_nearmost_xy[...,0],ref_nearmost_xy[...,1]]
img_show(img_nearmost_bg)
In this way, a black background is added.
After that, change the affine transformation matrix and play freely.
affin = np.matrix('2,0.5,15;1,-3,200;0,0,1')
Recommended Posts