Use TensorFlow Hub to bring ** a pre-learned model of general object recognition (made by Google) ** and use it to perform general object recognition (general object detection) for any image. It is the content.
Basically, I refer to https://github.com/tensorflow/hub/blob/master/examples/colab/object_detection.ipynb.
The execution environment is ** Google Colab. **, and it supports TensorFlow ** 2.x **.
Change to use GPU for calculation. Select "Runtime"-"Change Runtime Type" from the menu at the top and change the hardware accelerator to "** GPU **".
Next, add an environment variable that specifies the directory where you want to temporarily store the modules (trained models) that you bring from TensorFlow Hub. This process is necessary only when you want to check "What kind of module will be downloaded?" And can be omitted.
Add environment variable
import os
os.environ['TFHUB_CACHE_DIR'] ='/content/tfhub'
Check environment variables
!printenv TFHUB_CACHE_DIR
Switch to using TensorFlow 2.x.
TensorFlow2.Switch to x
%tensorflow_version 2.x
Also, upload the image you want to recognize the object to Google Colab. (You can upload it by expanding the sidebar, activating the file tab, and dragging and dropping the file). Here, the jpg format file is used, but the png format is also OK.
Get the general object detection module (trained model) from TensorFlow Hub.
Detector loading
import tensorflow as tf
import tensorflow_hub as hub
module_handle = 'https://tfhub.dev/google/openimages_v4/ssd/mobilenet_v2/1'
#module_handle = 'https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1'
detector = hub.load(module_handle).signatures['default']
The above module is "* SSD-based object detection model trained on Open Images V4 with ImageNet pre-trained MobileNet V2 as image feature extractor. *". In addition to this, various modules for "image object detection" are available on the hub.
If the environment variable TFHUB_CACHE_DIR
is set, the obtained module will be stored there (if it is not specified, it will be stored somewhere in / tmp
?).
Write a function run_detector (...)
that takes the "detector" and "image file path" loaded above as arguments, executes object detection, and outputs the result as text.
The function showImage (...)
called in the last line will be created later, so comment it out here.
The main points are as follows.
--The image file to be input is $ 227 \ times 227 $ pixels, and the channels are in RGB order.
--The choice of reduction algorithm (generation of input data) also affects detection. I'm using ʻImage.LANCZOS here, but changing it to ʻImage.NEAREST
will change the detection score.
Function definition to perform object detection
import time
import numpy as np
import PIL.Image as Image
def run_detector(detector, path):
#Import an image and convert it to a format that can be input to detector
img = Image.open(path) # Pillow(PIL)
if img.mode == 'RGBA' :
img = img.convert('RGB')
converted_img = img.copy()
converted_img = converted_img.resize((227,227),Image.LANCZOS) #Reduce to input size
converted_img = np.array(converted_img, dtype=np.float32) # np.Convert to array
converted_img = converted_img / 255. # 0.0 ~ 1.Normalize to 0
converted_img = converted_img.reshape([1,227,227,3])
converted_img = tf.constant(converted_img)
t1 = time.time()
result = detector(converted_img) #General object detection (main body)
t2 = time.time()
print(f'Detection time: {t2-t1:.3f}Seconds' )
#Preparing to output the result as text
r = {key:value.numpy() for key,value in result.items()}
boxes = r['detection_boxes']
scores = r['detection_scores']
decode = np.frompyfunc( lambda p : p.decode('ascii'), 1, 1)
class_names = decode( r['detection_class_entities'] )
#Score is 0.Text output for more than 25 results (n)
print(f'Discovery object' )
n = np.count_nonzero(scores >= 0.25 )
for i in range(n):
y1, x1, y2, x2 = tuple(boxes[i])
x1, x2 = int(x1*img.width), int(x2*img.width)
y1, y2 = int(y1*img.height),int(y2*img.height)
t = f'{class_names[i]:10} {100*scores[i]:3.0f}% '
t += f'({x1:>4},{y1:>4}) - ({x2:>4},{y2:>4})'
print(t)
# showImage(np.array(img), r, min_score=0.25) #Overlay the detection result on the image
The above run_detector (...)
is called as follows:
Execute by specifying the image path
img_path = '/content/sample1.jpg'
run_detector(detector, img_path)
Here, I used the following photos (free material) for sample1.jpg
.
The execution result is as follows. The numbers in parentheses are the coordinates of the upper left and lower right of the rectangle that surrounds the object.
Detection time: 0.251 seconds
Discovery object
Human face 57% ( 522, 156) - ( 636, 276)
Clothing 57% ( 403, 203) - ( 757, 577)
Clothing 57% ( 144, 211) - ( 481, 583)
Girl 41% ( 393, 104) - ( 763, 595)
Girl 34% ( 214, 81) - ( 619, 614)
With such text, the result is difficult to understand, so I will overlay it on the image.
Write the function showImage (...)
to overlay the detection result on the image.
Overlay display of detection results on images
import matplotlib.pyplot as plt
import matplotlib.patheffects as pe
def showImage(img, r, min_score=0.1):
fig = plt.figure(dpi=150,figsize=(8,8))
ax = plt.gca()
ax.tick_params(axis='both', which='both', left=False,
labelleft=False, bottom=False, labelbottom=False)
ax.imshow(img)
decode = np.frompyfunc( lambda p : p.decode("ascii"), 1, 1)
boxes = r['detection_boxes']
scores = r['detection_scores']
class_names = decode( r['detection_class_entities'] )
n = np.count_nonzero(scores >= min_score)
# class_Color preparation corresponding to names
class_set = np.unique(class_names[:n])
colors = dict()
cmap = plt.get_cmap('tab10')
for i, v in enumerate(class_set):
colors[v] =cmap(i)
#Draw Rectangle Draw from the one with the lowest score
img_w = img.shape[1]
img_h = img.shape[0]
for i in reversed(range(n)):
text = f'{class_names[i]} {100*scores[i]:.0f}%'
color = colors[class_names[i]]
y1, x1, y2, x2 = tuple(boxes[i])
y1, y2 = y1*img_h, y2*img_h
x1, x2 = x1*img_w, x2*img_w
#frame
r = plt.Rectangle(xy=(x1, y1), width=(x2-x1), height=(y2-y1),
fill=False, edgecolor=color, joinstyle='round',
clip_on=False, zorder=8+(n-i) )
ax.add_patch( r )
#Tags: text
t = ax.text(x1+img_w/200, y1-img_h/300, text, va='bottom', fontsize=6, color=color,zorder=8+(n-i))
t.set_path_effects([pe.Stroke(linewidth=1.5,foreground='white'), pe.Normal()])
fig.canvas.draw()
r = fig.canvas.get_renderer()
coords = ax.transData.inverted().transform(t.get_window_extent(renderer=r))
tag_w = abs(coords[0,0]-coords[1,0])+img_w/100
tag_h = abs(coords[0,1]-coords[1,1])+img_h/120
#Tags: background
r = plt.Rectangle(xy=(x1, y1-tag_h), width=tag_w, height=tag_h,
edgecolor=color, facecolor=color,
joinstyle='round', clip_on=False, zorder=8+(n-i))
ax.add_patch( r )
Then, remove the commented out in the last line of the definition of run_detector (...)
and execute run_detector (detector, img_path)
again.
The following execution result (image) is obtained.
Switch to another module (detector) and try object recognition for the same image (it takes a lot of time).
python
import tensorflow as tf
import tensorflow_hub as hub
#module_handle = 'https://tfhub.dev/google/openimages_v4/ssd/mobilenet_v2/1'
module_handle = 'https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1'
detector = hub.load(module_handle).signatures['default']
The execution result is as follows. It takes many times longer than before, but we are able to detect more.
Detection time: 1.379 seconds
Discovery object
Human face 94% ( 524, 147) - ( 625, 272)
Human face 86% ( 266, 149) - ( 351, 270)
Clothing 75% ( 383, 234) - ( 750, 565)
Footwear 70% ( 154, 511) - ( 306, 598)
Boy 65% ( 351, 93) - ( 759, 606)
Footwear 59% ( 311, 521) - ( 477, 600)
Clothing 53% ( 152, 225) - ( 438, 565)
Girl 53% ( 144, 88) - ( 481, 598)
Boy 49% ( 225, 88) - ( 618, 592)
Boy 45% ( 145, 90) - ( 464, 603)
Girl 37% ( 324, 85) - ( 771, 587)
Sun hat 29% ( 479, 78) - ( 701, 279)
Recommended Posts