Application of affine transformation by tensor-from basic to object detection-

This is an article that tries to simplify preprocessing by using affine transformation by tensor. Affine transformations can be applied to images, but the same transformations can be applied to annotations such as Bounding Box. You can also extend it to a tensor to apply different transformations to an object at the same time.

Basics of affine transformation

The affine transformation expresses the movement of points with the following formula.

\begin{bmatrix}x' \\\ y' \\\ 1 \end{bmatrix} = \begin{bmatrix}a & b & c \\\ d & e & f \\\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix}x \\\ y \\\ 1 \end{bmatrix} \tag{1}

If you want to see a vague understanding of affine transformation, please also read this article.

It takes the product of a 3x3 matrix and a 3x1 matrix, where the 3x3 matrix defines the transformation and the 3x1 matrix is the point before the move.

Specific example 1: Translation

When the point (10, 20) is moved 100 in the $ x $ direction and 50 in the $ y $ direction, the coordinates after the movement are (110, 70), which are expressed as follows.

\begin{bmatrix}110 \\\ 70 \\\ 1 \end{bmatrix} = \begin{bmatrix}1 & 0 & 100 \\\ 0 & 1 & 50 \\\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix}10 \\\ 20 \\\ 1 \end{bmatrix} \tag{2}

You may think that it is not necessary to use such a matrix, but the point is that it can be represented by a matrix. If you write it in code, it is as follows.

affine = np.array([[1, 0, 100], [0, 1, 50], [0, 0, 1]])
source = np.array([10, 20, 1])[:, None]
dest = np.dot(affine, source)
print(dest)
#[[110]
# [ 70]
# [  1]]

Assuming that the movement in the $ x $ direction is $ t_x $ and the movement in the $ y $ direction is $ t_y $, the transformation matrix for translation is as follows.

\begin{bmatrix}1 & 0 & t_x \\\ 0 & 1 & t_y \\\ 0 & 0 & 1 \end{bmatrix} \tag{3}

The third row of each matrix is a meaningless number. 1 is included for the convenience of matrix product calculation.

Specific example 2: Scaling

When the point (50, 100) is doubled in the $ x $ direction and 0.8 times in the $ y $ direction, the coordinates after movement are (100, 80), which are expressed as follows.

\begin{bmatrix}100 \\\ 80 \\\ 1 \end{bmatrix} = \begin{bmatrix}2 & 0 & 0 \\\ 0 & 0.8 & 0 \\\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix}50 \\\ 100 \\\ 1 \end{bmatrix} \tag{4}
affine = np.array([[2, 0, 0], [0, 0.8, 0], [0, 0, 1]])
source = np.array([50, 100, 1])[:, None]
dest = np.dot(affine, source)
print(dest)
# [[100.]
#  [ 80.]
#  [  1.]]

Assuming that the expansion in the $ x $ direction is $ s_x $ and the expansion in the $ y $ direction is $ s_y $, the transformation matrix for scaling is as follows.

\begin{bmatrix}s_x & 0 & 0 \\\ 0 & s_y & 0 \\\ 0 & 0 & 1 \end{bmatrix} \tag{5}

Affine transformation for multiple points

Since the affine transformation is a matrix calculation, the number of points can be expanded arbitrarily. When finding the affine transformation for $ N $ points, take the product of the 3x3 matrix and the 3xN matrix. The result is a 3xN matrix.

\begin{bmatrix} x_1' & \cdots & x_N' \\\ y_1' & \cdots & y_N' \\\ 1 & \cdots & 1 \end{bmatrix} = \begin{bmatrix} a & b & c \\\ d & e & f \\\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x_1 & \cdots & x_N \\\ y_1 & \cdots & y_N \\\ 1 & \cdots & 1 \end{bmatrix} \tag{6}

Specific example 3: Rectangle affine transformation

Affine transformation of $ N = 4 $ points results in a quadrilateral affine transformation. Applying an affine transformation that doubles $ (2, 3) $ in the $ (x, y) $ direction and translates $ (5, -1) $ to the vertices of the rectangle gives:

w, h = 2, 1
points = np.array([[0, w, w, 0], [0, 0, h, h], [1, 1, 1, 1]], np.float32) # (3, 4)
affine = np.array([[2, 0, 5], [0, 3, -1], [0, 0, 1]])  # (3, 3)
dest = np.dot(affine, points)

plt.scatter(points[0,:], points[1,:], color="cyan")
plt.scatter(dest[0,:], dest[1,:], color="magenta")
plt.show()

affine_01.png

Specific example 4: Rotating rectangle and Bounding Box

This is an example that can be used in preprocessing for object detection. When you rotate the image in the Data Augmentation for object detection, you also need to rotate the Bounding Box. Bounding Box can be defined by 2 points, upper left and lower right, but by taking 4 points of vertices, you can easily calculate Bounding Box after rotation. If you take the minimum and maximum values for x and y after rotation, the coordinates of the upper left and lower right of the rotated Bounding Box can be obtained.

Rotation is also one of the affine transformations, and the transformation matrix when rotating $ \ theta $ counterclockwise around the origin is

\begin{bmatrix} \cos\theta & -\sin\theta & 0 \\\ \sin\theta & \cos\theta & 0 \\\ 0 & 0 & 1 \end{bmatrix} \tag{7}

And when the vertices of the Bounding Box (square) before rotation move to $ (x_1', y_1'), \ cdots, (x_4', y_4') $ after rotation, a new Bounding circumscribes the tilted quadrangle. Box coordinates are

You can find it at. The reason why we need to do this calculation is that the original vertex after rotation is not suitable as a Bounding Box (because it is not a rectangle parallel to the $ xy $ axis) and needs to be adjusted. Please see the video below for details.

affine_02.gif

The plot looks like this: Due to the plot, 5 points are moved (overlapping the origin), but for calculation only, moving 4 points is OK.

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches

def rotate_box():
    w, h = 2, 1
    max_wh = max(w, h)
    points = np.array([[0, w, w, 0, 0], [0, 0, h, h, 0], [1, 1, 1, 1, 0]], np.float32) #Original Bounding Box
    for theta in range(0, 360, 10):
        rad = np.radians(theta)
        rotate_matrix = np.array([
            [np.cos(rad), -np.sin(rad), 0],
            [np.sin(rad), np.cos(rad), 0],
            [0, 0, 1]], np.float32)
        dest_points = np.dot(rotate_matrix, points)[:2, :] #Rectangle after rotation
        rectangle = np.concatenate([np.min(dest_points, axis=-1),
                                    np.max(dest_points, axis=-1)]) #New Bounding Box

        plt.clf()
        plt.plot(dest_points[0,:], dest_points[1,:], linewidth=2, marker="o")

        ax = plt.gca()
        rect = patches.Rectangle(rectangle[:2], *(rectangle[2:] - rectangle[:2]),
                                 linewidth=1, edgecolor="magenta", fill=False)
        ax.add_patch(rect)
        plt.ylim(-max_wh*2, max_wh*2)
        plt.xlim(-max_wh * 2, max_wh * 2)
        plt.title("degree = " + str(theta))
        plt.show()

Synthesis of affine transformations

Affine transformations can be combined by taking a matrix product. When converting $ A_1-> A_2 $, we take the product $ A_2A_1P $ ($ P $ is a matrix of points). Notice that the order is reversed. Also, ** the commutative law does not hold **, and if you change the order, the result will be different.

For example, suppose $ A_1 $ doubles $ x, y $, $ A_2 $ moves 50 in the $ x $ direction, and 100 moves in the $ y $ direction. At this time, $ A_2A_1 (A_1 \ to A_2) $ is

A_2A_1 = \begin{bmatrix}1 & 0 & 50 \\\ 0 & 1 & 100 \\\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix}2 & 0 & 0 \\\ 0 & 2 & 0 \\\ 0 & 0 & 1 \end{bmatrix} = \begin{bmatrix}2 & 0 & 50 \\\ 0 & 2 & 100 \\\ 0 & 0 & 1 \end{bmatrix} \tag{8}

However, $ A_1A_2 (A_2 \ to A_1) $ is

A_1A_2 = \begin{bmatrix}2 & 0 & 0 \\\ 0 & 2 & 0 \\\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix}1 & 0 & 50 \\\ 0 & 1 & 100 \\\ 0 & 0 & 1 \end{bmatrix} = \begin{bmatrix}2 & 0 & 100 \\\ 0 & 2 & 200 \\\ 0 & 0 & 1 \end{bmatrix} \tag{9}

And the size of the translation changes. This represents the difference between "translating and then enlarging or enlarging and then translating". If you don't know the order, it's a good idea to try a simple example like this.

Specific example 5: A foolish person approaching while rotating

Since the affine transformation can be applied to any number of points earlier, it is okay to have thousands of points. Here, "smart person" (from free material)

atamanowaruihito.png

Is converted to point cloud data, and plots while synthesizing the affine transformations of enlargement and rotation. Almost the same as Example 4.

from PIL import Image

def atamanowaruihito():
    with Image.open("atamanowaruihito.png ") as img:
        img = img.resize((img.width // 2, img.height // 2))        
        img = img.convert("L").point(lambda x: 255 if x >= 128 else 0)  #Graying, binarization
        points = np.stack(np.where(np.array(img) == 0)[::-1], axis=0)  # yx ->xy to matrix the point cloud
        points[1,:] = img.height - points[1,:]  #Correct the y-axis from positive downwards to positive upwards(2, 5912)
        points = np.concatenate([points, np.ones_like(points[0:1, :])], axis=0) #Add 1 to the 3rd line(3, 5912)

    for theta in range(0, 360, 1):
        rad = np.radians(theta)
        rotate_matrix = np.array([
            [np.cos(rad), -np.sin(rad), 0],
            [np.sin(rad), np.cos(rad), 0],
            [0, 0, 1]], np.float32) #Rotation matrix
        scale_matrix = np.eye(3, dtype=np.float32) * (1 + theta / 180)
        #The synthesis of affine transformation is the matrix product of the transformation matrix
        # A1->For A2, take the product in the order of A2A1 (note the order)
        affine = np.dot(scale_matrix, rotate_matrix)  #Rotate and enlarge
                
        dest_points = np.dot(affine, points)[:2, :] #Point cloud after rotation
        rectangle = np.concatenate([np.min(dest_points, axis=-1),
                                    np.max(dest_points, axis=-1)]) #Bounding Box for point cloud

        plt.clf()
        plt.scatter(dest_points[0,:], dest_points[1,:], s=1)

        ax = plt.gca()
        rect = patches.Rectangle(rectangle[:2], *(rectangle[2:] - rectangle[:2]),
                                 linewidth=1, edgecolor="magenta", fill=False)
        ax.add_patch(rect)
        plt.ylim(-750, 750)
        plt.xlim(-750, 750)
        plt.show()

affine_03.gif

The Bounding Box can be calculated in the same way as the rectangle example.

Apply multiple affine transformations at the same time

From here is the tensor calculation. Consider a method of applying multiple affine transformations to the same point at the same time, instead of synthesizing affine transformations. It's hard to express in an expression, so think in code.

If there was only one affine transformation, the transformation matrix would have a shape of (3, 3)` ``. But what if there are two transformations, that is, the shape `` `(2, 3, 3)? In other words,

# points(4 points): (3, 4), affines: (2, 3, 3)
output = np.zeros((2, 3, 4)
for i in range(affines.shape[0]):
        output[i] = np.dot(affines[i], points)

I want to calculate. Actually, this is a one-liner without using a for loop,

output = np.matmul(affines, points)

It can be calculated with.

Specific example 6: Multiple affine transformations of a quadrangle

Let's try three affine transformations in Example 3.

  1. The original quadrangle as it is (identity conversion)
  2. Double $ (2, 3) $ in the $ (x, y) $ direction and translate $ (5, -1) $
  3. Double $ (3, 1) $ in the $ (x, y) $ direction and translate $ (10, 2) $
from matplotlib import cm

def rectangle_multi():
    w, h = 2, 1
    points = np.array([[0, w, w, 0], [0, 0, h, h], [1, 1, 1, 1]], np.float32)  # (3, 4)
    a1 = np.eye(3) #Identity conversion
    a2 = np.array([[2, 0, 5], [0, 3, -1], [0, 0, 1]]) # (3, 3)
    a3 = np.array([[3, 0, 10], [0, 1, 2], [0, 0, 1]])  # (3, 3)
    affine = np.stack([a1, a2, a3], axis=0) # (3, 3, 3)
    dest = np.matmul(affine, points)  # (3, 3, 4)

    cmap = cm.get_cmap("tab10")
    for i in range(3):
        plt.scatter(dest[i,0,:], dest[i, 1,:], color=cmap(i))
    plt.show()

affine_04.png

I was able to perform three different transformations in just one calculation. In this way, ** one-to-many conversion ** can also be done by using affine transformation by tensor.

Specific example 7: Anchor Box for object detection

In object detection, the Bounding Box is predicted from each point (Anchor) of the output of the neural network. The Bounding Box corresponds to the (coordinates) of the raw image, but many coordinates appear in object detection. For example

In other words, flexible point conversion "from coordinate to coordinate" is required. This coordinate transformation can be unified by the affine transformation by the tensor.

Now consider the following:

As a policy, consider two affine transformations.

  1. Affine transformation of preprocessing (original image → input image). 1.5 times in the $ x $ direction and 0.8 times in the $ y $ direction. The shape of `(3, 3)`.
  2. Input image → anchor affine transformation. The shape of `(4, 4, 3, 3) ```, with the affine transformation of (3, 3) ``` superimposed on the vertical and horizontal 4x4. `` (i, j,:, :)` `` $ (0 \ leq i, j \ leq 3) The affine transformation of the position of $ is as follows. Since it is a translation seen from the Anchor Box, the sign will be negative.
\begin{bmatrix}1/16 & 0 & -0.5+i \\\ 0 & 1/16 & -0.5+j \\\ 0 & 0 & 1 \end{bmatrix} \tag{10}

All you have to do is combine the two, 1 and 2. This is delicious when you want to do the inverse transformation (anchor → original image), take the inverse matrix of the composite tensor of 1 and 2, that is,

inv_transform = np.linalg.inv(combined_affine)

You can get the inverse conversion tensor just by doing. If the input of np.linalg.inv is a tensor, stack the inverse matrix for the last two axes.

For object detection, the notation of $ (y, x) $ is more convenient than the notation of $ (x, y) $ (because the tensor of the image is $ (B, H, W, C) $), so translation Affine transformations of $ t_x, t_y $, scaling $ s_x, s_y $

\begin{bmatrix}s_y & 0 & t_y \\\ 0 & s_x & t_X \\\ 0 & 0 & 1 \end{bmatrix} \tag{11}

It is represented by. Even if the axes are exchanged, it functions as an affine transformation.

def anchor_box():
    bounding_boxes = np.array([[10, 20, 30, 40], [30, 30, 50, 50]])  # (2, 4)
    points = bounding_boxes.reshape(-1, 2).T  # (4, 2) -> (2, 4)
    points = np.concatenate([points, np.ones_like(points[0:1,:])], axis=0)  # (3, 4)
    #Think in yx coordinates
    a1 = np.array([[0.8, 0, 0], [0, 1.5, 0], [0, 0, 1]])  #0 in the y direction.8 times, 1 in the x direction.5 times
    #anchor
    offset_x, offset_y = np.meshgrid(-(np.arange(4) + 0.5), -(np.arange(4) + 0.5)) #Since the origin of the input image is seen from the anchor, minus parallel movement
    a2 = np.zeros((4, 4, 3, 3)) + np.eye(3).reshape(1, 1, 3, 3) / 16.0 # (4, 4, 3, 3)Broadcast to
    a2[:,:,0,2] = offset_y
    a2[:,:, 1, 2] = offset_x
    a2[:,:, 2, 2] = 1.0
    #Affine synthesis
    affine = np.matmul(a2, a1)
    #Affine transformation
    raw_dest = np.matmul(affine, points)  # (4, 4, 3, 4)
    dest = raw_dest.swapaxes(-1, -2)[:,:,:,:2]  # (4, 4, 4, 3)Transposed tensor version-> (4, 4, 4, 2)
    dest = dest.reshape(4, 4, 2, 4)
    print("Coordinates after affine transformation when the Bounding Box of the original image is viewed at each anchor")
    print(dest)

    #Reverse conversion and check
    raw_inv = np.matmul(np.linalg.inv(affine), raw_dest)
    inv = raw_inv.swapaxes(-1, -2)[:,:,:,:2]
    inv = inv.reshape(4, 4, 2, 4)  #Matches bounding boxes
    print("Reverse conversion of the converted coordinates returns to the original value")
    print(inv)

    #Inverse conversion affine (confirmation)
    inv_transoform = np.linalg.inv(affine)
    print("Inverse conversion affine (for debugging)")
    print(inv_transoform)
Click to view output
Coordinates after affine transformation when the Bounding Box of the original image is viewed at each anchor
[[[[ 0.      1.375   1.      3.25  ]
   [ 1.      2.3125  2.      4.1875]]

  [[ 0.      0.375   1.      2.25  ]
   [ 1.      1.3125  2.      3.1875]]

  [[ 0.     -0.625   1.      1.25  ]
   [ 1.      0.3125  2.      2.1875]]

  [[ 0.     -1.625   1.      0.25  ]
   [ 1.     -0.6875  2.      1.1875]]]


 [[[-1.      1.375   0.      3.25  ]
   [ 0.      2.3125  1.      4.1875]]

  [[-1.      0.375   0.      2.25  ]
   [ 0.      1.3125  1.      3.1875]]

  [[-1.     -0.625   0.      1.25  ]
   [ 0.      0.3125  1.      2.1875]]

  [[-1.     -1.625   0.      0.25  ]
   [ 0.     -0.6875  1.      1.1875]]]


 [[[-2.      1.375  -1.      3.25  ]
   [-1.      2.3125  0.      4.1875]]

  [[-2.      0.375  -1.      2.25  ]
   [-1.      1.3125  0.      3.1875]]

  [[-2.     -0.625  -1.      1.25  ]
   [-1.      0.3125  0.      2.1875]]

  [[-2.     -1.625  -1.      0.25  ]
   [-1.     -0.6875  0.      1.1875]]]


 [[[-3.      1.375  -2.      3.25  ]
   [-2.      2.3125 -1.      4.1875]]

  [[-3.      0.375  -2.      2.25  ]
   [-2.      1.3125 -1.      3.1875]]

  [[-3.     -0.625  -2.      1.25  ]
   [-2.      0.3125 -1.      2.1875]]

  [[-3.     -1.625  -2.      0.25  ]
   [-2.     -0.6875 -1.      1.1875]]]]
Reverse conversion of the converted coordinates returns to the original value
[[[[10. 20. 30. 40.]
   [30. 30. 50. 50.]]

  [[10. 20. 30. 40.]
   [30. 30. 50. 50.]]

  [[10. 20. 30. 40.]
   [30. 30. 50. 50.]]

  [[10. 20. 30. 40.]
   [30. 30. 50. 50.]]]


 [[[10. 20. 30. 40.]
   [30. 30. 50. 50.]]

  [[10. 20. 30. 40.]
   [30. 30. 50. 50.]]

  [[10. 20. 30. 40.]
   [30. 30. 50. 50.]]

  [[10. 20. 30. 40.]
   [30. 30. 50. 50.]]]


 [[[10. 20. 30. 40.]
   [30. 30. 50. 50.]]

  [[10. 20. 30. 40.]
   [30. 30. 50. 50.]]

  [[10. 20. 30. 40.]
   [30. 30. 50. 50.]]

  [[10. 20. 30. 40.]
   [30. 30. 50. 50.]]]


 [[[10. 20. 30. 40.]
   [30. 30. 50. 50.]]

  [[10. 20. 30. 40.]
   [30. 30. 50. 50.]]

  [[10. 20. 30. 40.]
   [30. 30. 50. 50.]]

  [[10. 20. 30. 40.]
   [30. 30. 50. 50.]]]]
Inverse conversion affine (for debugging)
[[[[20.          0.         10.        ]
   [ 0.         10.66666667  5.33333333]
   [ 0.          0.          1.        ]]

  [[20.          0.         10.        ]
   [ 0.         10.66666667 16.        ]
   [ 0.          0.          1.        ]]

  [[20.          0.         10.        ]
   [ 0.         10.66666667 26.66666667]
   [ 0.          0.          1.        ]]

  [[20.          0.         10.        ]
   [ 0.         10.66666667 37.33333333]
   [ 0.          0.          1.        ]]]


 [[[20.          0.         30.        ]
   [ 0.         10.66666667  5.33333333]
   [ 0.          0.          1.        ]]

  [[20.          0.         30.        ]
   [ 0.         10.66666667 16.        ]
   [ 0.          0.          1.        ]]

  [[20.          0.         30.        ]
   [ 0.         10.66666667 26.66666667]
   [ 0.          0.          1.        ]]

  [[20.          0.         30.        ]
   [ 0.         10.66666667 37.33333333]
   [ 0.          0.          1.        ]]]


 [[[20.          0.         50.        ]
   [ 0.         10.66666667  5.33333333]
   [ 0.          0.          1.        ]]

  [[20.          0.         50.        ]
   [ 0.         10.66666667 16.        ]
   [ 0.          0.          1.        ]]

  [[20.          0.         50.        ]
   [ 0.         10.66666667 26.66666667]
   [ 0.          0.          1.        ]]

  [[20.          0.         50.        ]
   [ 0.         10.66666667 37.33333333]
   [ 0.          0.          1.        ]]]


 [[[20.          0.         70.        ]
   [ 0.         10.66666667  5.33333333]
   [ 0.          0.          1.        ]]

  [[20.          0.         70.        ]
   [ 0.         10.66666667 16.        ]
   [ 0.          0.          1.        ]]

  [[20.          0.         70.        ]
   [ 0.         10.66666667 26.66666667]
   [ 0.          0.          1.        ]]

  [[20.          0.         70.        ]
   [ 0.         10.66666667 37.33333333]
   [ 0.          0.          1.        ]]]]

You can see that the coordinates of the original Bounding Box are restored even if the original image → anchor and anchor → original image are inversely converted. It is strong that the inverse transformation is one in matrix calculation. When you want to see if the anchor transformation is working, it is easy to check the affine (inverse matrix) of the inverse transformation.

Specific example 8: Anyway, a "smart person" who seems to be sick

This can also be done by utilizing the affine transformation by the tensor. Please watch the video below.

affine_05.gif

I will omit the explanation, but if you are interested, please take a look at the code.

Click to view code
def atamanowaruihito2():
    with Image.open("atamanowaruihito.png ") as img:
        img = img.resize((img.width // 2, img.height // 2))        
        img = img.convert("L").point(lambda x: 255 if x >= 128 else 0)  #Graying, binarization
        points = np.stack(np.where(np.array(img) == 0)[::-1], axis=0)  # yx ->xy to matrix the point cloud
        points[1,:] = img.height - points[1,:]  #Correct the y-axis from positive downwards to positive upwards(2, 5912)
        points = np.concatenate([points, np.ones_like(points[0:1, :])], axis=0) #Add 1 to the 3rd line(3, 5912)

    #Random numbers for rotation matrix
    a = np.random.uniform(1.0, 5.0, size=(4, 4))
    b = np.random.uniform(-180, 180, size=a.shape)
    #Random numbers for augmented matrix
    c = np.random.uniform(0.5, 1.5, size=a.shape)
    d = np.random.uniform(1.0, 5.0, size=a.shape)
    e = np.random.uniform(-180, 180, size=a.shape)
    f = np.random.uniform(0.5, 1.0, size=a.shape) + c
    #Random number for translation
    e = np.random.uniform(0, 200, size=a.shape)
    g = np.random.uniform(1.0, 5.0, size=a.shape)
    h = np.random.uniform(-180, 180, size=a.shape)
    
    for theta in range(0, 360, 10):
        #Rotation conversion
        rad = np.radians(a * theta + b)
        rotate_tensor = np.broadcast_to(np.eye(3)[None, None,:], (4, 4, 3, 3)).copy()
        rotate_tensor[:,:, 0, 0] = np.cos(rad)
        rotate_tensor[:,:, 0, 1] = -np.sin(rad)
        rotate_tensor[:,:, 1, 0] = np.sin(rad)
        rotate_tensor[:,:, 1, 1] = np.cos(rad)
        #Dilation
        rad = np.radians(d * theta + e)
        scale_tensor = np.broadcast_to(np.eye(3)[None, None,:], (4, 4, 3, 3)).copy()
        scale_tensor[:,:, 0, 0] = c * np.sin(rad) + f
        scale_tensor[:,:, 1, 1] = c * np.sin(rad) + f
        #Individual translation
        rad = np.radians(g * theta + h)
        transform_tensor = np.broadcast_to(np.eye(3)[None, None,:], (4, 4, 3, 3)).copy()
        transform_tensor[:,:, 0, 2] = e * np.cos(rad)        
        transform_tensor[:,:, 1, 2] = e * np.sin(rad)
        #Overall translation
        shift_x, shift_y = np.meshgrid(np.arange(4), np.arange(4))
        anchor_tensor = np.broadcast_to(np.eye(3)[None, None,:], (4, 4, 3, 3)).copy()
        anchor_tensor[:,:, 0, 2] = shift_x * 500
        anchor_tensor[:,:, 1, 2] = shift_y * 500

        #Rotate → Enlarge → Translate → Translate the whole
        affine = np.matmul(anchor_tensor, transform_tensor)
        affine = np.matmul(affine, scale_tensor)
        affine = np.matmul(affine, rotate_tensor)
        dest_points = np.matmul(affine, points)[:, :, :2, :] #Point cloud after rotation
        rectangle = np.concatenate([np.min(dest_points, axis=-1),
                                    np.max(dest_points, axis=-1)], axis=-1)  #Bounding Box for point cloud
                                    
        dest_points = dest_points.swapaxes(-1, -2).reshape(-1, 2)
        rectangle = rectangle.reshape(-1, 4)

        plt.clf()
        plt.scatter(dest_points[:, 0], dest_points[:, 1], s=1)

        ax = plt.gca()
        for i in range(rectangle.shape[0]):
            rect = patches.Rectangle(rectangle[i, :2], *(rectangle[i, 2:] - rectangle[i, :2]),
                                    linewidth=1, edgecolor="magenta", fill=False)
            ax.add_patch(rect)
        plt.ylim(-750, 2500)
        plt.xlim(-750, 2500)
        plt.title("theta = " + str(theta))
        plt.show()

Recommended Posts

Application of affine transformation by tensor-from basic to object detection-
Affine transformation by OpenCV (CUDA)
Basic flow of anomaly detection
Summary of basic implementation by PyTorch
A light introduction to object detection
__Getattr__ and __getattribute__ to customize the acquisition of object attributes by dots
Calculation of mean IoU in object detection
Face detection by collecting images of Angers.
Deep Understanding Object Detection by Deep Learning by Keras
Affine transformation by matrix (scaling, rotation, shearing, movement) -Reinventor of Python image processing-
Try to detect an object with Raspberry Pi ~ Part 1: Comparison of detection speed ~