Overview

Use TensorFlow Hub to bring ** a pre-learned model of general object recognition (made by Google) ** and use it to perform general object recognition (general object detection) for any image. It is the content.

Basically, I refer to https://github.com/tensorflow/hub/blob/master/examples/colab/object_detection.ipynb.

ダウンロード (2).png

The execution environment is ** Google Colab. **, and it supports TensorFlow ** 2.x **.

Preparation

Change to use GPU for calculation. Select "Runtime"-"Change Runtime Type" from the menu at the top and change the hardware accelerator to "** GPU **".

Next, add an environment variable that specifies the directory where you want to temporarily store the modules (trained models) that you bring from TensorFlow Hub. This process is necessary only when you want to check "What kind of module will be downloaded?" And can be omitted.

`Add environment variable`


import os
os.environ['TFHUB_CACHE_DIR'] ='/content/tfhub'

`Check environment variables`


!printenv TFHUB_CACHE_DIR

Switch to using TensorFlow 2.x.

`TensorFlow2.Switch to x`


%tensorflow_version 2.x

Also, upload the image you want to recognize the object to Google Colab. (You can upload it by expanding the sidebar, activating the file tab, and dragging and dropping the file). Here, the jpg format file is used, but the png format is also OK.

Get the general object detection module (trained model) from the hub

Get the general object detection module (trained model) from TensorFlow Hub.

`Detector loading`


import tensorflow as tf
import tensorflow_hub as hub

module_handle = 'https://tfhub.dev/google/openimages_v4/ssd/mobilenet_v2/1'
#module_handle = 'https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1'

detector = hub.load(module_handle).signatures['default']

The above module is "* SSD-based object detection model trained on Open Images V4 with ImageNet pre-trained MobileNet V2 as image feature extractor. *". In addition to this, various modules for "image object detection" are available on the hub.

If the environment variable TFHUB_CACHE_DIR is set, the obtained module will be stored there (if it is not specified, it will be stored somewhere in / tmp?).

Perform object detection

Write a function run_detector (...) that takes the "detector" and "image file path" loaded above as arguments, executes object detection, and outputs the result as text.

The function showImage (...) called in the last line will be created later, so comment it out here.

The main points are as follows.

--The image file to be input is $ 227 \ times 227 $ pixels, and the channels are in RGB order. --The choice of reduction algorithm (generation of input data) also affects detection. I'm using ʻImage.LANCZOS here, but changing it to ʻImage.NEAREST will change the detection score.

`Function definition to perform object detection`


import time
import numpy as np
import PIL.Image as Image

def run_detector(detector, path):

  #Import an image and convert it to a format that can be input to detector
  img = Image.open(path) # Pillow(PIL)
  if img.mode == 'RGBA' :
    img = img.convert('RGB')
  converted_img = img.copy()
  converted_img = converted_img.resize((227,227),Image.LANCZOS) #Reduce to input size
  converted_img = np.array(converted_img, dtype=np.float32)     # np.Convert to array
  converted_img = converted_img / 255. # 0.0 ～ 1.Normalize to 0
  converted_img = converted_img.reshape([1,227,227,3])
  converted_img = tf.constant(converted_img)

  t1 = time.time()
  result = detector(converted_img) #General object detection (main body)
  t2 = time.time()
  print(f'Detection time: {t2-t1:.3f}Seconds' )

  #Preparing to output the result as text
  r = {key:value.numpy() for key,value in result.items()}
  boxes =       r['detection_boxes']
  scores =      r['detection_scores']
  decode = np.frompyfunc( lambda p : p.decode('ascii'), 1, 1)
  class_names = decode( r['detection_class_entities'] )

  #Score is 0.Text output for more than 25 results (n)
  print(f'Discovery object' )
  n = np.count_nonzero(scores >= 0.25 )
  for i in range(n):
    y1, x1, y2, x2 = tuple(boxes[i])
    x1, x2 = int(x1*img.width), int(x2*img.width)
    y1, y2 = int(y1*img.height),int(y2*img.height)
    t = f'{class_names[i]:10} {100*scores[i]:3.0f}%  '
    t += f'({x1:>4},{y1:>4}) - ({x2:>4},{y2:>4})'
    print(t)

  # showImage(np.array(img), r, min_score=0.25) #Overlay the detection result on the image

The above run_detector (...) is called as follows:

`Execute by specifying the image path`


img_path = '/content/sample1.jpg'
run_detector(detector, img_path)

Here, I used the following photos (free material) for sample1.jpg.

The execution result is as follows. The numbers in parentheses are the coordinates of the upper left and lower right of the rectangle that surrounds the object.

Detection time: 0.251 seconds
Discovery object
Human face  57%  ( 522, 156) - ( 636, 276)
Clothing    57%  ( 403, 203) - ( 757, 577)
Clothing    57%  ( 144, 211) - ( 481, 583)
Girl        41%  ( 393, 104) - ( 763, 595)
Girl        34%  ( 214,  81) - ( 619, 614)

With such text, the result is difficult to understand, so I will overlay it on the image.

Overlay the detection result on the image

Write the function showImage (...) to overlay the detection result on the image.

`Overlay display of detection results on images`


import matplotlib.pyplot as plt
import matplotlib.patheffects as pe 

def showImage(img, r, min_score=0.1):
  fig = plt.figure(dpi=150,figsize=(8,8))
  ax = plt.gca()
  ax.tick_params(axis='both', which='both', left=False, 
                 labelleft=False, bottom=False, labelbottom=False)
  ax.imshow(img)

  decode = np.frompyfunc( lambda p : p.decode("ascii"), 1, 1)

  boxes =       r['detection_boxes']
  scores =      r['detection_scores']
  class_names = decode( r['detection_class_entities'] )

  n = np.count_nonzero(scores >= min_score)
  
  # class_Color preparation corresponding to names
  class_set = np.unique(class_names[:n])
  colors = dict()
  cmap = plt.get_cmap('tab10')
  for i, v in enumerate(class_set):
    colors[v] =cmap(i)

  #Draw Rectangle Draw from the one with the lowest score
  img_w = img.shape[1]
  img_h = img.shape[0]
  for i in reversed(range(n)):
    text = f'{class_names[i]} {100*scores[i]:.0f}%'
    color = colors[class_names[i]]
    y1, x1, y2, x2 = tuple(boxes[i])
    y1, y2 = y1*img_h, y2*img_h
    x1, x2 = x1*img_w, x2*img_w

    #frame
    r = plt.Rectangle(xy=(x1, y1), width=(x2-x1), height=(y2-y1),
                      fill=False, edgecolor=color, joinstyle='round', 
                      clip_on=False, zorder=8+(n-i) )
    ax.add_patch( r )

    #Tags: text
    t = ax.text(x1+img_w/200, y1-img_h/300, text, va='bottom', fontsize=6, color=color,zorder=8+(n-i))
    t.set_path_effects([pe.Stroke(linewidth=1.5,foreground='white'), pe.Normal()])
    fig.canvas.draw()
    r = fig.canvas.get_renderer()
    coords = ax.transData.inverted().transform(t.get_window_extent(renderer=r))
    tag_w = abs(coords[0,0]-coords[1,0])+img_w/100
    tag_h = abs(coords[0,1]-coords[1,1])+img_h/120

    #Tags: background
    r = plt.Rectangle(xy=(x1, y1-tag_h), width=tag_w, height=tag_h,
                      edgecolor=color, facecolor=color,
                      joinstyle='round', clip_on=False, zorder=8+(n-i))
    ax.add_patch( r )

Then, remove the commented out in the last line of the definition of run_detector (...) and execute run_detector (detector, img_path) again.

The following execution result (image) is obtained. ダウンロード (1).png

bonus

Switch to another module (detector) and try object recognition for the same image (it takes a lot of time).

`python`


import tensorflow as tf
import tensorflow_hub as hub

#module_handle = 'https://tfhub.dev/google/openimages_v4/ssd/mobilenet_v2/1'
module_handle = 'https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1'

detector = hub.load(module_handle).signatures['default']

The execution result is as follows. It takes many times longer than before, but we are able to detect more. ダウンロード (2).png

Detection time: 1.379 seconds
Discovery object
Human face  94%  ( 524, 147) - ( 625, 272)
Human face  86%  ( 266, 149) - ( 351, 270)
Clothing    75%  ( 383, 234) - ( 750, 565)
Footwear    70%  ( 154, 511) - ( 306, 598)
Boy         65%  ( 351,  93) - ( 759, 606)
Footwear    59%  ( 311, 521) - ( 477, 600)
Clothing    53%  ( 152, 225) - ( 438, 565)
Girl        53%  ( 144,  88) - ( 481, 598)
Boy         49%  ( 225,  88) - ( 618, 592)
Boy         45%  ( 145,  90) - ( 464, 603)
Girl        37%  ( 324,  85) - ( 771, 587)
Sun hat     29%  ( 479,  78) - ( 701, 279)

General object detection using Google's pre-trained model (via TensorFlow Hub)

Overview

Preparation

Add environment variable

Check environment variables

TensorFlow2.Switch to x