Introduction

Let's enable logo detection using the TensorFlow Object Detection API announced in July 2017. When I tried object detection before, I used the prepared trained data, but this time I will start from creating teacher data I will.

procedure

Creation of teacher data
Learning
Detection test

We will proceed in the order of.

Logo to detect

This time I will try to detect only one type of giftee logo that I work for.

Creation of teacher data

Image collection

Use Google Image Search to collect images that contain your logo. This time, I collected about 40 images.

Rectangle selection and labeling

Because the TensorFlow Object Detection API detects objects in the image, We need data to teach the rectangle containing the object and the label of the object. This time, we will use labelImg to create that data.

labelImg At first I tried to run it on mac, but it didn't work well even if I tried to follow the official procedure, so I ran it on Ubuntu. It ’s quite anomalous,

Run Ubuntu on vagrant on mac
Set up and run lableImg on Ubuntu
labelImg GUI runs on mac using vagrant's X11 transfer feature and XQuartz

I took the procedure.

X11 transfer of vagrant to Vagrantfile

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  ...
  config.ssh.forward_x11 = true
  ...
end

It will be valid if you write.

labelImg specifies the directory stored in the image, selects the logo data in a rectangle, and I will add a label.

Putting the default label name in Use Default label will make your work much easier.

After selecting the rectangle and setting the label, press save to save it as XML data.

Conversion to TFRecord

Converts the image and XML to TFRecord format for TensorFlow to read the data. This time, create_pet_tf_record.py in Tutorial Modify (/blob/master/object_detection/create_pet_tf_record.py) to create a conversion script.

The main fix is that the original script was supposed to get the label name from the file name, but now it returns the "giftee" label.

#class_name = get_class_name_from_filename(data['filename'])
class_name = 'giftee'

Other than that, I use it with almost no modification.

Create labelmap.pbtxt

Since there is only one type this time, I will write only one item.

item {
  id: 1
  name: 'giftee'
}

Save it as giftee_label_map.pbtxt.

Conversion script execution

Run the conversion script.

Most of the steps from here follow the Tutorial.

$ python object_detection/create_giftee_tf_record.py --label_map_path=object_detection/data/giftee_label_map.pbtxt --data_dir=`pwd` --output_dir=`pwd`

The training data "giftee_train.record" and the evaluation data "giftee_val.record" are created in the executed hierarchy.

config creation

Create a config that sets model parameters, learning rate during learning, tfrecord file path, label text path, and so on.

This time, I will use the one in Tutorial with some modifications.

# Faster R-CNN with Resnet-101 (v1) configured for the Oxford-IIIT Pet Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  faster_rcnn {
    num_classes: 1 
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
      }
    }
    feature_extractor {
      type: 'faster_rcnn_resnet101'
      first_stage_features_stride: 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 14
    maxpool_kernel_size: 2
    maxpool_stride: 2
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 300
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}

train_config: {
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0003
          schedule {
            step: 0
            learning_rate: .0003
          }
          schedule {
            step: 900000
            learning_rate: .00003
          }
          schedule {
            step: 1200000
            learning_rate: .000003
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  #fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
  from_detection_checkpoint: true
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "PATH_TO_BE_CONFIGURED/giftee_train.record"
  }
  label_map_path: "PATH_TO_BE_CONFIGURED/giftee_label_map.pbtxt"
}

eval_config: {
  num_examples: 2000
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "PATH_TO_BE_CONFIGURED/giftee_train.record"
  }
  label_map_path: "PATH_TO_BE_CONFIGURED/giftee_label_map.pbtxt"
  shuffle: false
  num_readers: 1
}

The main changes are that class_num is set to 1 and the file name is changed. Just comment out fine_tune_checkpoint Other than that, the sample one is used.

fine_tune_checkpoint is when there is already trained data It seems to be for use, but this time I will learn from 0, so I comment it out.

PATH_TO_BE_CONFIGURED uses Google Cloud Storage this time, so I replaced it with that path as in the tutorial.

$ sed -i "s|PATH_TO_BE_CONFIGURED|"gs://${YOUR_GCS_BUCKET}"/data|g" \
    object_detection/samples/configs/faster_rcnn_resnet101_giftee.config

Google Compute Cloud settings

Google Cloud Storage Upload the file created so far to Google Cloud Storage. The path after uploading is as follows.

+ ${YOUR_GCS_BUCKET}/
  + data/
    - faster_rcnn_resnet101_giftee.config
    - giftee_label_map.pbtxt
    - giftee_train.record
    - giftee_val.record

Google Cloud ML Engine Prepare the files for use with Google Cloud ML Engine. In the models hierarchy as in the tutorial

$ python setup.py sdist
$ cd slim && python setup.py sdist

Then, the necessary files

dist/object_detection-0.1.tar.gz
and slim/dist/slim-0.1.tar.gz.

Is generated.

training

Run your training using the Google Cloud ML Engine.

At first I tried training locally on mac, but it took too long and I gave up. When I changed to ML Enigne, it was about 30 times faster.

The command executes the following with reference to the tutorial.

# From tensorflow/models/
$ gcloud ml-engine jobs submit training `whoami`_object_detection_`date +%s` \
    --job-dir=gs://${YOUR_GCS_BUCKET}/train \
    --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
    --module-name object_detection.train \
    --region us-central1 \
    --config object_detection/samples/cloud/cloud.yml \
    -- \
    --train_dir=gs://${YOUR_GCS_BUCKET}/train \
    --pipeline_config_path=gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_giftee.config

`whoami`_object_detection_`date +%s`

The part is the name of the job.

$ echo `whoami`_object_detection_`date +%s`

Then you can see what the name will be.

For --job-dir and --train_dir, specify the directory to store the learning progress. The data will be updated at regular intervals.

For --packages, specify the tar.gz file you created earlier.

It seems that you can change the number of jobs and specifications with --config. This time we will use the default one.

Running this command will run the job on the ML Engine. You can check the job status and logs with GCP's ML Engine.

Learning time

This time I moved it for about 2 hours and stopped at 30,000 steps. The GCP fee was about 4000 yen. ..

eval Based on the learned parameters, the evaluation is performed using images that are not used for learning. Also refer to the tutorial and execute the following command to run the job on ML Enigne.

# From tensorflow/models/
$ gcloud ml-engine jobs submit training `whoami`_object_detection_eval_`date +%s` \
    --job-dir=gs://${YOUR_GCS_BUCKET}/train \
    --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
    --module-name object_detection.eval \
    --region us-central1 \
    --scale-tier BASIC_GPU \
    -- \
    --checkpoint_dir=gs://${YOUR_GCS_BUCKET}/train \
    --eval_dir=gs://${YOUR_GCS_BUCKET}/eval \
    --pipeline_config_path=gs://${YOUR_GCS_BUCKET}/data/faster_rcnn_resnet101_giftee.config

tensorborad If you use tensorboard, you can check the transition of loss graphically. You can also see the image of the result of logo judgment with the parameters actually learned on the IMAGES tab.

Start tensorboard by giving the path of Google Cloud Storage,

$ tensorboard --logdir=gs://${YOUR_GCS_BUCKET}

You can check the transition by accessing localhost: 6006 with a browser.

Conversion to Tensorflow Graph

In order to judge the logo using the trained data, it is necessary to convert it to a Tensorflow graph once.

First, download the data locally from Google Cloud Storage,

$ gsutil cp gs://${YOUR_GCS_BUCKET}/train/model.ckpt-${CHECKPOINT_NUMBER}.* .

Run the conversion script.

python object_detection/export_inference_graph.py \
    --input_type image_tensor \
    --pipeline_config_path object_detection/samples/configs/faster_rcnn_resnet101_giftee.config \
    --checkpoint_path model.ckpt-${CHECKPOINT_NUMBER} \
    --inference_graph_path output_inference_graph.pb

It seems that CHECKPOINT_NUMBER should be the one with the largest number in the training data.

Converted when running this script

output_inference_graph.pb

Is output.

Logo judgment

I will finally make a logo judgment.

The python script for judgment will be executed on Jupyter based on this tutorial.

You can proceed almost according to the tutorial, but I will correct the part of the Tensorflow graph and the label data path.

# What model to download.
#MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017'
#MODEL_FILE = MODEL_NAME + '.tar.gz'
#DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = 'data/output_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = 'data/giftee_label_map.pbtxt'

NUM_CLASSES = 1

Also, I will not download the model, so comment it out.

#opener = urllib.request.URLopener()
#opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
#tar_file = tarfile.open(MODEL_FILE)
#for file in tar_file.getmembers():
#   file_name = os.path.basename(file.name)
#  if 'frozen_inference_graph.pb' in file_name:
#        tar_file.extract(file, os.getcwd())

Finally, modify the image path and execute. The file names used for judgment should be image1.jpg and image2.jpg.

# For the sake of simplicity we will use only 2 images:
# image1.jpg
# image2.jpg
# If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.
PATH_TO_TEST_IMAGES_DIR = 'gifteeimages'
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 3) ]

# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)

Execution result

In some images, the logo was detected correctly as shown below.

Finally

After studying for about 2 hours, I was able to make a judgment with higher accuracy than I expected. This time, it seems that the teacher data is as small as about 40, so I think that the accuracy will be further improved if the amount of teacher data is increased and the learning time is lengthened. (GCP usage fee is a bottleneck ...)

Logo detection using TensorFlow Object Detection API