The implemented ones are at the bottom of the article.
HOG (Histgram Of Gradient) is like the distribution of brightness gradients in an image. Since you can detect where the brightness changes significantly, you can roughly get the edge distribution of the image. The explanation of here was easy to understand.
Expressed as an image
I feel like this. (The source of the image is Kotoba sisters standing picture material (30 types each))
It is especially important this time that it is hard to be influenced by these two points.
The HOG extraction method is implemented in various libraries. In the range I tried
--scikit-image hog --HOGDescriptor for OpenCV (for Python) --HOGDescriptor for OpenCV (for java)
I tried three types. Terrifyingly ** The methods implemented in these will return different results (only the number of dimensions is equal) **. The outline of the algorithm is the same, but the detailed implementation such as the calculation procedure seems to be different.
Just grab the edge with the Canny filter and hollow out the inside. It is almost this.
The original image
Edge extraction image
Crop image
Since the location of the icon image part in the frame is fixed, all you have to do is determine the position and cut it out. I adjusted it appropriately while experimenting.
You can get an image like this.
I tried several types of cell size and block size, but this combination was good. In order to use this combination well, the size of the judgment image is set to 30x30. I will paste the Python code (estimated part). Since the version pasted first was old, I replaced only the hyperparameters used for HOG calculation (2017/3/4)
estimate.py
#A method for estimating Pokemon names. The argument data is a pair of Pokemon name and its HOG.
def estimate_poke_index(image, data):
hog = calculateHOG(image)
distance_list = calculate_manhattan_distance(hog, data)
return np.argmin(np.array(distance_list))
#A method that reads an image, calculates and returns HOG features
def calculateHOG(image, orient=9, cell_size=5, block_size=6):
#Load image
number_color_channels = np.shape(image)[2]
if number_color_channels > 3:
mask = image[:, :, 3]
image = image[:, :, :3]
for i in range(len(mask)):
for j in range(len(mask[0])):
if mask[i][j] == 0:
image[i, j, :] = 255
#Resize the image to 30x30
resized_image = cv2.resize(image, (30, 30))
images = cv2.split(resized_image)
fd = []
for monocolor_image in images:
blur_image = cv2.GaussianBlur(monocolor_image, (3,3),0)
fd.extend(hog(blur_image, orientations=orient,
pixels_per_cell=(cell_size, cell_size), cells_per_block=(block_size, block_size)))
return fd
#Distance calculation method between HOG features
def calculate_distance_HOG(target, data):
distance_list = []
rows, columns = np.shape(data)
for i in range(rows):
distance_list.append(np.linalg.norm(data[i, :] - target))
return distance_list
#Manhattan distance calculation method
def calculate_manhattan_distance(target, data):
distance_list = []
rows, columns = np.shape(data)
for i in range(rows):
distance_list.append(np.sum(np.abs(data[i, :] - target)))
return distance_list
Although it is a rough algorithm, 70 to 80% of the experience is hit. It would be nice if I could write about the test results, but I'm afraid I haven't done a consistent test. (2017/3/4 postscript) When I got the data for 19 games and checked the false recognition rate, I made a mistake with 20/114. It seems that about 80% can be guessed, but I want to manage the misrecognition of Pokemon with high KP such as "Greninja → Fukuslow" and "Garula → Swablu".
Two are probably the main misperceptions. The second may be correct if the algorithm is slightly improved.
Although it is for tool developers, I will put Implemented and solidified in Java. It's okay to use it freely, but I'd be happy if you could let me know. (Because I want to go see the finished product)