One of the similar image search algorithms is dHash. For the contents of the algorithm, see "Calculate the similarity of images using Perceptual Hash". Is easy to understand, but I understand that it is a similar image search algorithm with the following characteristics.
So, I tried using this dHash to see if I could identify the position of the scene on the course from a scene (onboard video) of the play of the racing game Assetto Corsa.
Specifically, the flow is as follows.
① First, all frames are extracted and saved as PNG images from a play video (onboard video) that goes around the course.
② Calculates the dHash hash value for all saved frame images. In addition, since it is possible to identify the position of the vehicle at a certain time from the telemetry data acquired when shooting the play video, the hash value calculated in combination with the position information is saved in the search CSV file.
③ On the other hand, select one scene where you want to identify the position on the course from another play video.
④ Calculates the dHash hash value for the selected one-scene image.
⑤ Search the CSV file for search for the image with the hash value closest to the calculated hash value of dHash. Since the position information is linked to the image hit by the search in (2), that position is regarded as the position on the course of the selected scene.
In one scene of a racing game, even if you are close to the course, there will be a slight misalignment due to differences in the lines that pass through each play. I think the point of this time is to be able to absorb such differences and search for similar images.
This time, we will implement the above process in Python.
I used OpenCV this time to extract the images of all frames from the play video file (mp4 file). I refer to the following page.
The frame image is saved with the file name "(frame number) .png ".
01_extract_frames.py
import cv2
import sys
def extract_frame(video_file, save_dir):
capture = cv2.VideoCapture(video_file)
frame_no = 0
while True:
retval, frame = capture.read()
if retval:
cv2.imwrite(r'{}\{:06d}.png'.format(save_dir, frame_no), frame)
frame_no = frame_no + 1
else:
break
if __name__ == '__main__':
video_file = sys.argv[1]
save_dir = sys.argv[2]
extract_frame(video_file, save_dir)
This script is also used in ③.
Of the extracted frame images, the hash value of dHash is calculated using the dhash function of the ImageHash package for the image corresponding to a certain period (a specific lap part). In addition, it is linked with the telemetry data (extracting only the relevant part in advance) acquired by the following in-game App script and output to the search CSV file.
02_calc_dhash.py
from PIL import Image, ImageFilter
import imagehash
import csv
import sys
frame_width = 1280
frame_hight = 720
trim_lr = 140
trim_tb = 100
dhash_size = 8
def calc_dhash(frame_dir, frame_no_from, frame_no_to, telemetry_file, output_file):
#Read telemetry data file
position_data = [row for row in csv.reader(open(telemetry_file), delimiter = '\t')]
writer = csv.writer(open(output_file, mode = 'w', newline=''))
for i in range(frame_no_from, frame_no_to + 1):
#Read the extracted frame image and crop it (to erase the time etc. displayed at the edge of the image)
frame = Image.open(r'{}\{:06d}.png'.format(frame_dir, i))
trimed_frame = frame.crop((
trim_lr,
trim_tb,
frame_width - trim_lr,
frame_hight - trim_tb))
#Calculation of dHash value
dhash_value = str(imagehash.dhash(trimed_frame, hash_size = dhash_size))
#Linking with telemetry data
#Since both images and telemetry are output at regular intervals, they are simply linked in proportion to the number of lines.
position_no = round((len(position_data) - 1) * (i - frame_no_from) / (frame_no_to - frame_no_from))
writer.writerow([
i,
dhash_value,
position_data[position_no][9],
position_data[position_no][10]
])
if __name__ == '__main__':
frame_dir = sys.argv[1]
frame_no_from = int(sys.argv[2])
frame_no_to = int(sys.argv[3])
telemetry_file = sys.argv[4]
output_file = sys.argv[5]
calc_dhash(frame_dir, frame_no_from, frame_no_to, telemetry_file, output_file)
As a result of this script, the following information (frame number), (hash value), and (2D coordinate position in meters) are output to the CSV file.
731,070b126ee741c080,-520.11,139.89
732,070b126ee7c1c080,-520.47,139.90
733,070b126ee7c1c480,-520.84,139.92
This script is also used in ④.
The image with the hash value closest to the specified hash value is searched from the search CSV file output in 2-2.
The Hamming distance is used for the closeness of hash values. I use popcount from the gmpy2 package to calculate the Hamming distance (because it seems to be very fast).
03_match_frame.py
import csv
import gmpy2
import sys
def match_frame(base_file, search_hash):
base_data = [row for row in csv.reader(open(base_file))]
min_distance = 64
min_line = None
results = []
for base_line in base_data:
distance = gmpy2.popcount(
int(base_line[1], 16) ^
int(search_hash, 16)
)
if distance < min_distance:
min_distance = distance
results = [base_line]
elif distance == min_distance:
results.append(base_line)
print("Distance = {}".format(min_distance))
for min_line in results:
print(min_line)
if __name__ == '__main__':
base_file = sys.argv[1]
search_hash = sys.argv[2]
match_frame(base_file, search_hash)
As shown below, the location information associated with the image information with the closest hash value is output. (If there are multiple images with the same distance, all image information will be displayed.)
> python.exe 03_match_frame.py dhash_TOYOTA_86.csv cdc9cebc688f3f47
Distance = 8
['13330', 'c9cb4cb8688f3f7f', '-1415.73', '-58.39']
['13331', 'c9eb4cbc688f3f7f', '-1415.39', '-58.44']
This time, all frames of the play video of the Nürburgring Nordschleife (total length 20.81km) run on the TOYOTA 86 GT are extracted ⇒ Hash value calculation is performed, and it is close to the image of some scenes of another play video run on the BMW Z4. I will look for the image.
First of all, let's check if the images of the three famous corners can be searched properly.
Image used for search | Hit image | |
---|---|---|
image | ||
Hash value | ced06061edcf9f2d | 0c90e064ed8f1f3d |
location information | (-2388.29, 69.74) | (-2416.50, 66.67) |
Hash value distance = 10, location information deviation = 28.4m
In dHash, it is said that if the hash value distance is 10 or less, it is regarded as the same image, but it is just barely. Even on the image, the shape of the corners is similar, but the positions of the surrounding trees are slightly different, so it is difficult to judge whether they are similar or not.
Image used for search | Hit image | |
---|---|---|
image | ||
Hash value | 7c5450640c73198c | 7c7c50642d361b0a |
location information | (317.58, -121.52) | (316.18, -121.45) |
Hash value distance = 11, location information deviation = 1.4m
This is pretty close in terms of images. However, the hash value distance is 11 which is larger than before.
Image used for search | Hit image | |
---|---|---|
image | ||
Hash value | 665d1d056078cde6 | 665c1d856050da8d |
location information | (2071.48, 77.01) | (2071.23, 77.12) |
Hash value distance = 13, location information deviation = 0.27m
The position of the vehicle is also very close, and the images look quite similar, but if you look closely, the orientation is slightly different. The hash value distance is 13 which is quite large.
As mentioned above, I tried it in three famous corners, but it seems that I can identify the closest position.
In addition, I tried it in 10 randomly selected scenes, and it became as follows.
Only one case where only the wrong position was hit is shown below.
Image used for search | Hit image | |
---|---|---|
image | ||
Hash value | b7b630b24c1e1f1e | b7b43839481e3f1f |
location information | (1439.61, -18.69) | (2059.41, 37.44) |
Hash value distance = 9, location information deviation = 622.34m
It seems that the way the trees grow on the left and right and the tip of the course are quite different from the left curve or straight, but the hash value distance is relatively close to 9.
By the way, the images that I would like you to hit are as follows.
Image used for search | Hit image |
---|---|
Hash value difference = 13
At first glance, the images look similar, but if you look closely, they are out of alignment, and as a result, I think the difference is relatively large.
In this article, I searched for similar images using dHash for racing game scenes.
The accuracy is relatively good for characteristic scenes such as famous corners, but when I randomly selected the scenes, the winning percentage was 70% (please forgive the fact that the number of confirmed cases is as small as 10).
It is not a solid interpretation, but my personal impression is as follows.
If you want to improve the accuracy, I think you can think of the following measures.
Recommended Posts