The goal is to find the position of an arbitrary object placed on a plane (desk) with a reference marker in the real coordinate system as follows **. It is implemented in Python and uses OpenCV's ** ArUco ** module.
-[Measure object position with OpenCV / ArUco library](https://inody1991.tumblr.com/post/172449790255/opencvaruco%E3%83%A9%E3%82%A4%E3%83%95-%E3%83 % A9% E3% 83% AA% E3% 81% A6-% E7% 89% A9% E4% BD% 93% E4% BD% 8D% E7% BD% AE% E8% A8% 88% E6% B8% AC) @ inoblog was used as a reference.
Use GoogleColab. (Python 3.6.9). The version of the module used is as follows.
Various module versions
%pip list | grep -e opencv -e numpy
numpy 1.17.5
opencv-contrib-python 4.1.2.30
opencv-python 4.1.2.30
First, load the modules required for processing.
In the GoogleColab. environment, cv2.imshow (...)
is not available, but by doing the following, the image (np.ndarray) can be displayed in the execution result cell with cv2_imshow (...)
. You can output it.
Module import
import cv2
import numpy as np
from google.colab.patches import cv2_imshow
ArUco has ** a set of predefined markers **. This time, we will use it. ʻAruco.getPredefinedDictionary (...) gets the dictionary that contains the predefined markers. The argument ʻaruco.DICT_4X4_50
selects a dictionary with up to $ 50 $ markers with a $ 4 \ times 4 $ fill pattern inside the square.
Save the predefined markers (0-3) as an image file
aruco = cv2.aruco
p_dict = aruco.getPredefinedDictionary(aruco.DICT_4X4_50)
marker = [0] * 4 #Initialization
for i in range(len(marker)):
marker[i] = aruco.drawMarker(p_dict, i, 75) # 75x75 px
cv2.imwrite(f'marker{i}.png', marker[i])
Executing the above will output $ 4 $ files ($ 75 \ times 75 $ pixel size) from "marker0.png " to "marker3.png ".
In the GoogleColab. environment, the created marker can be displayed and confirmed in the output cell with cv2_imshow (...)
. You can see that a marker with a $ 4 \ times 4 $ fill pattern is created inside the square.
If you give DICT_5X5_100
to the argument of ʻaruco.getPredefinedDictionary (...)`, you can get a dictionary with up to $ 100 $ markers defined in the pattern of $ 5 \ times 5 $.
Print "marker0.png ", that is, the output of the 0th marker of ʻaruco.DICT_4X4_50 on paper. Then, let's call it "m0-photo.jpg " taken with a camera. Markers are detected for this target, and the image ʻimg_marked
that overlays the detection results is output.
Detected from photos containing marker 0
img = cv2.imread('m0-photo.jpg')
corners, ids, rejectedImgPoints = aruco.detectMarkers(img, p_dict) #detection
img_marked = aruco.drawDetectedMarkers(img.copy(), corners, ids) #Overlay detection results
cv2_imshow(img_marked) #display
The entire marker is ** green square ** </ font>, the upper left corner of the marker is ** red square ** </ font>, and the marker ID (number) ) Is drawn with ** blue characters </ font> ** overlaid on the original image. From this, you can see that the marker detection is done properly.
Let's take a closer look at the ** return ** of ʻaruco.detectMarkers (...) `.
corners
stores the ** corner (four corners) image coordinates ** of each marker found in the image in a list of np.ndarray.
py:aruco.detectMarkers(...)Return value corners
#■ Return corners
# print(type(corners)) # -> <class 'list'>
# print(len(corners)) # -> 1
# print(type(corners[0])) # -> <class 'numpy.ndarray'>
# print(corners[0].shape) # -> (1, 4, 2)
print(corners)
Execution result
[array([[[265., 258.],
[348., 231.],
[383., 308.],
[297., 339.]]], dtype=float32)]
The corner coordinates are stored counterclockwise in the order of top left, top right, bottom right, bottom left of the marker. That is, it looks like this:
ʻIds` stores the ** marker ID ** found in the image in numpy.ndarray format.
py:aruco.detectMarkers(...)Return value ids
#■ Return ids
# print(type(ids)) # -> <class 'numpy.ndarray'>
# print(ids.shape) # -> (1, 1)
print(ids)
The result will be something like [[0]]
(note that it is not [0]
).
rejectedImgPoints
stores the coordinates of the squares found in the image that do not contain the proper pattern inside (https://docs.opencv.org/4.1.2). /d9/d6a/group__aruco.html#gab9159aa69250d8d3642593e508cb6baa) contains the imgPoints of those squares whose inner code has not a correct codification. Useful for debugging purposes.).
I think it's easier to understand this if you look at a concrete example. rejectedImgPoints
stores the coordinates of the square as pointed by the arrow in the figure.
Check the result when there are multiple markers in the image (including the case where there are two or more same markers) as follows. The code for detection is the same as before. Overlaying the detection results looks like this: It does not matter if there are two or more of the same markers.
The return values of ʻaruco.detectMarkers (...) ʻids
and corners
are as follows: Note that they are not sorted by ID.
print(ids)Result of
[[1]
[1]
[0]
[2]]
print(corners)Result of
[array([[[443., 288.],
[520., 270.],
[542., 345.],
[464., 363.]]], dtype=float32), array([[[237., 272.],
[313., 255.],
[331., 328.],
[254., 346.]]], dtype=float32), array([[[ 64., 235.],
[140., 218.],
[154., 290.],
[ 75., 307.]]], dtype=float32), array([[[333., 113.],
[404., 98.],
[424., 163.],
[351., 180.]]], dtype=float32)]
Prepare a piece of paper in which the four markers "marker0.png " to "marker3.png " created first are arranged clockwise as shown below. Here, the distance between the markers is $ 150 \ mathrm {mm} $ (this paper creation will be inaccurate if not done strictly).
Place the object "dog" whose position you want to detect on this paper, and shoot from ** an appropriate position above ** (the shooting position and angle at this time do not have to be exact). The purpose is to find the position of this "dog" in the ** real coordinate system ** relative to the marker.
The camera image taken from an appropriate position and angle above is converted to the ** image seen from directly above **. Here, the converted image is $ 500 \ times 500 , \ mathrm {px} $.
Convert to an image seen from directly above
aruco = cv2.aruco
p_dict = aruco.getPredefinedDictionary(aruco.DICT_4X4_50)
img = cv2.imread('inu.jpg')
corners, ids, rejectedImgPoints = aruco.detectMarkers(img, p_dict) #detection
#Store the "center coordinates" of the marker in m in order from the upper left in the clockwise direction.
m = np.empty((4,2))
for i,c in zip(ids.ravel(), corners):
m[i] = c[0].mean(axis=0)
width, height = (500,500) #Image size after transformation
marker_coordinates = np.float32(m)
true_coordinates = np.float32([[0,0],[width,0],[width,height],[0,height]])
trans_mat = cv2.getPerspectiveTransform(marker_coordinates,true_coordinates)
img_trans = cv2.warpPerspective(img,trans_mat,(width, height))
cv2_imshow(img_trans)
The execution result is as follows. You can see that the ** center ** of each marker is transformed so that it becomes the four corners of the image. Even if you use an image taken from a more oblique position, you can convert it to an image as seen from directly above as follows.
For the sake of clarity, the ** center ** of each marker is now the four corners of the converted image. However, it is inconvenient to calculate the position of the "dog" (marker fragments may be reflected in the four corners ...).
Therefore, modify the program so that the corners of each marker touching the $ 150 \ times 150 \ mathrm {mm} $ rectangle on the paper are the four corners of the converted image.
Converted to an image seen from directly above (improved version)
aruco = cv2.aruco
p_dict = aruco.getPredefinedDictionary(aruco.DICT_4X4_50)
img = cv2.imread('inu.jpg')
corners, ids, rejectedImgPoints = aruco.detectMarkers(img, p_dict) #detection
#Change here
corners2 = [np.empty((1,4,2))]*4
for i,c in zip(ids.ravel(), corners):
corners2[i] = c.copy()
m[0] = corners2[0][0][2]
m[1] = corners2[1][0][3]
m[2] = corners2[2][0][0]
m[3] = corners2[3][0][1]
width, height = (500,500) #Image size after transformation
marker_coordinates = np.float32(m)
true_coordinates = np.float32([[0,0],[width,0],[width,height],[0,height]])
trans_mat = cv2.getPerspectiveTransform(marker_coordinates,true_coordinates)
img_trans = cv2.warpPerspective(img,trans_mat,(width, height))
cv2_imshow(img_trans)
The execution result is as follows. The size of this image is $ 500 \ times 500 , \ mathrm {px} $, and the corresponding size on paper is $ 150 \ times 150 , \ mathrm {mm} $. Therefore, if the coordinates on the image are $ (160 \ mathrm {px}, , 200 \ mathrm {px}) $, the corresponding real coordinates are $ (160 \ times \ frac {150} {500} = 48. It can be calculated as \ mathrm {mm}, , 200 \ times \ frac {150} {500} = 60 \ mathrm {mm}) $.
Use the various functions provided by OpenCV to find the position of the dog from the above image. Especially important is cv2.connectedComponentsWithStats (...)
.
Position output after object detection and real coordinate conversion
tmp = img_trans.copy()
# (1)Grayscale conversion
tmp = cv2.cvtColor(tmp, cv2.COLOR_BGR2GRAY)
#cv2_imshow(tmp)
# (2)Blur processing
tmp = cv2.GaussianBlur(tmp, (11, 11), 0)
#cv2_imshow(tmp)
# (3)Binarization process
th = 130 #Binarization threshold(Adjustment required)
_,tmp = cv2.threshold(tmp,th,255,cv2.THRESH_BINARY_INV)
#cv2_imshow(tmp)
# (4)Blob (= mass) detection
n, img_label, data, center = cv2.connectedComponentsWithStats(tmp)
# (5)Organize detection results
detected_obj = list() #Storage destination of detection result
tr_x = lambda x : x * 150 / 500 #X-axis image coordinates → real coordinates
tr_y = lambda y : y * 150 / 500 #Y axis 〃
img_trans_marked = img_trans.copy()
for i in range(1,n):
x, y, w, h, size = data[i]
if size < 300 : #Ignore areas less than 300px
continue
detected_obj.append( dict( x = tr_x(x),
y = tr_y(y),
w = tr_x(w),
h = tr_y(h),
cx = tr_x(center[i][0]),
cy = tr_y(center[i][1])))
#Verification
cv2.rectangle(img_trans_marked, (x,y), (x+w,y+h),(0,255,0),2)
cv2.circle(img_trans_marked, (int(center[i][0]),int(center[i][1])),5,(0,0,255),-1)
# (6)View results
cv2_imshow(img_trans_marked)
for i, obj in enumerate(detected_obj,1) :
print(f'■ Detected object{i}Center position X={obj["cx"]:>3.0f}mm Y={obj["cy"]:>3.0f}mm ')
The execution result is as follows.
Execution result
■ Center position X of detected object 1=121mm Y= 34mm
Place a ruler and see if the above results are correct. It seems that they are looking for it properly.
Let's put more objects and check. I needed to "adjust the threshold" when binarizing, but it worked. Detected object 1 is "dog (brown)", detected object 2 is "dog (gray)", detected object 3 is "rabbit", and detected object 4 is "bear".
--Try combining with machine learning (image classification) (classify each object in the image into "dog", "rabbit", and "bear"). --Try to create a demo in combination with a robot arm.
-Challenge image classification by TensorFlow2 + Keras -General object detection using Google's pre-trained model (via TensorFlow Hub)
Recommended Posts