[SWIFT] [Core ML] How to convert YOLO v3 to Core ML

I tried to convert YOLO v3 to CoreML, so I will write the procedure.

YOLO v3 is an algorithm that detects objects. The easiest way to get YOLO to work on iOS is to download and use Apple Official Core ML Models. However, this time I tried to convert it manually.

Original YOLO v3 is built on Darknet, but this time I will use the one converted to Keras here.

qqwweee/keras-yolo3 https://github.com/qqwweee/keras-yolo3

procedure

Here are the steps in Google Colaboratory.

** 1. Install & import the required libraries **

`Notebook`


!pip install tensorflow-gpu==1.14.0
!pip install -U coremltools
!pip install keras==2.2.4

** 2. Clone keras-yolo3 **

First, drop it from the repository.

`Notebook`


!git clone https://github.com/qqwweee/keras-yolo3

** 3. Try moving keras-yolo3 **

First, let's run keras-yolo3 as it is in Python. This area is the same as the procedure described in README.md on github.

First, download the weight information file.

`Notebook`


%cd keras-yolo3
#!wget https://pjreddie.com/media/files/yolov3.weights

Convert to keras version of yolo model.

`Notebook`


!python convert.py yolov3.cfg yolov3.weights model_data/yolo.h5

I will upload an appropriate image and infer it. This time I uploaded a file called neco.jpg.

! python yolo_video.py --image --input neco.jpg

#Output result
# (416, 416, 3)
# Found 2 boxes for img
# bed 0.66 (11, 481) (656, 660)
# cat 1.00 (98, 17) (624, 637)
# 6.3229040969999915

It seems that the cat can be detected properly.

** 4. Convert to CoreML **

Convert using Core ML Tools. The input image is 416 (width) x416 (height) x3 (RGB). Also, set image_scale to 1 / 255.0 for normalization.

`Notebook`


from keras.models import load_model
from coremltools.converters import keras as converter
mlmodel = converter.convert(keras_model,
  output_names=['grid1','grid2','grid3'], 
  input_name_shape_dict = {'input1' : [None, 416, 416, 3]},
  image_input_names='input1', 
  image_scale=1/255.0,
)

#output
# 0 : input_1, <keras.engine.input_layer.InputLayer object at 0x7f7e4058fa58>
# 1 : conv2d_1, <keras.layers.convolutional.Conv2D object at 0x7f7e41cffb38>
# 2 : batch_normalization_1, <keras.layers.normalization.BatchNormalization object at 0x7f7e41cc6438>
#~ Abbreviation ~
# For large sized arrays, multiarrays of type float32 are more efficient.
# In future, float input/output multiarrays will be produced by default by the converter.
# Please use, either the flag 'use_float_arraytype' during the call to convert or
# the utility 'coremltools.utils.convert_double_to_float_multiarray_type(spec)', post-conversion.

Save the converted Core ML model.

`Notebook`


coreml_model_path = 'yolo.mlmodel'
mlmodel.save(coreml_model_path)

** 5. Check the display of inference results **

The conversion to CoreML worked, but when I copied it to my Xcode project and tried to infer it in the Vision Framework, it failed.

This is because there are three outputs of YOLOv3, which have shapes of 1x1x255x13x13, 1x1x255x26x26, 1x1x255x52x52, but they cannot be interpreted by Vision Framework as they are. You need to decode the output.

About the output of YOLO v3, this blog was easy to understand. Model structure of general object recognition YOLO v3

Decoding seems to be difficult if you make it yourself, so this time I will use this project.

Ma-Dan/YOLOv3-CoreML https://github.com/Ma-Dan/YOLOv3-CoreML

This project assumes that the CoreML output will be 255x13x13, 255x26x26, 255x52x52. You need to reshape the output to this shape.

** 6. Reshape the output **

Reshape the output of the model as follows:

1x1x255x13x13 → 255x13x13 1x1x255x26x26 → 255x13x26 1x1x255x52x52 → 255x13x52

To do this, you need to add a layer to reshape with Core ML Tools. You can read more about how to edit layers in your Core ML model with Core ML Tools here.

How to edit layers in Core ML Tools https://qiita.com/TokyoYoshida/items/7aa67dcea059a767b4f2

It is a layer to reshape, but at first it reduces the dimension [because there is add_squeeze](https://apple.github.io/coremltools/generated/coremltools.models.neural_network.builder.html#coremltools.models.neural_network I tried (.builder.NeuralNetworkBuilder.add_squeeze) but for some reason it didn't work.

There was also add_reshape. Then the dimension of the first 1x1 part remained unreduced.

As a result of various investigations, there is add_reshape_static. I was able to reshape it well using this.

Add it as follows.

`Notebook`


from coremltools.models.neural_network import datatypes

builder.add_reshape_static(name='Reshape1', input_name='grid1', output_name='output1', output_shape=(255,13,13))
builder.add_reshape_static(name='Reshape2', input_name='grid2', output_name='output2', output_shape=(255,26,26))
builder.add_reshape_static(name='Reshape3', input_name='grid3', output_name='output3', output_shape=(255,52,52))

Then specify the shape of the output for the entire model.

`Notebook`


builder.spec.description.output[0].name = "output1"
builder.spec.description.output[0].type.multiArrayType.shape[0] = 255
builder.spec.description.output[0].type.multiArrayType.shape.append(13)
builder.spec.description.output[0].type.multiArrayType.shape.append(13)

builder.spec.description.output[1].name = "output2"
builder.spec.description.output[1].type.multiArrayType.shape[0] = 255
builder.spec.description.output[1].type.multiArrayType.shape.append(26)
builder.spec.description.output[1].type.multiArrayType.shape.append(26)

builder.spec.description.output[2].name = "output3"
builder.spec.description.output[2].type.multiArrayType.shape[0] = 255
builder.spec.description.output[2].type.multiArrayType.shape.append(52)
builder.spec.description.output[2].type.multiArrayType.shape.append(52)

Finally save the model.

`Notebook`


mlmodel_modified = coremltools.models.MLModel(spec)
mlmodel_modified.save('Yolov3.mlmodel')

** 7. Display on the app **

All you have to do now is drag and drop the Core ML model into your YOLO v3-Core ML project and run it.

If you try to display the model from Xcode, you can see that it is recognized correctly.

This is the execution result.

You can recognize it properly.

Finally

Note regularly publishes about iOS development, so please follow us. https://note.com/tokyoyoshida

It is also posted on Twitter. https://twitter.com/jugemjugemjugem