[SWIFT] [Core ML] How to convert YOLO v3 to Core ML

I tried to convert YOLO v3 to CoreML, so I will write the procedure.

YOLO v3 is an algorithm that detects objects. The easiest way to get YOLO to work on iOS is to download and use Apple Official Core ML Models. However, this time I tried to convert it manually.

Original YOLO v3 is built on Darknet, but this time I will use the one converted to Keras here.

qqwweee/keras-yolo3 https://github.com/qqwweee/keras-yolo3


Here are the steps in Google Colaboratory.

** 1. Install & import the required libraries **


!pip install tensorflow-gpu==1.14.0
!pip install -U coremltools
!pip install keras==2.2.4

** 2. Clone keras-yolo3 **

First, drop it from the repository.


!git clone https://github.com/qqwweee/keras-yolo3

** 3. Try moving keras-yolo3 **

First, let's run keras-yolo3 as it is in Python. This area is the same as the procedure described in README.md on github.

First, download the weight information file.


%cd keras-yolo3
#!wget https://pjreddie.com/media/files/yolov3.weights

Convert to keras version of yolo model.


!python convert.py yolov3.cfg yolov3.weights model_data/yolo.h5

I will upload an appropriate image and infer it. This time I uploaded a file called neco.jpg.

! python yolo_video.py --image --input neco.jpg

#Output result
# (416, 416, 3)
# Found 2 boxes for img
# bed 0.66 (11, 481) (656, 660)
# cat 1.00 (98, 17) (624, 637)
# 6.3229040969999915

It seems that the cat can be detected properly.

** 4. Convert to CoreML **

Convert using Core ML Tools. The input image is 416 (width) x416 (height) x3 (RGB). Also, set image_scale to 1 / 255.0 for normalization.


from keras.models import load_model
from coremltools.converters import keras as converter
mlmodel = converter.convert(keras_model,
  input_name_shape_dict = {'input1' : [None, 416, 416, 3]},

# 0 : input_1, <keras.engine.input_layer.InputLayer object at 0x7f7e4058fa58>
# 1 : conv2d_1, <keras.layers.convolutional.Conv2D object at 0x7f7e41cffb38>
# 2 : batch_normalization_1, <keras.layers.normalization.BatchNormalization object at 0x7f7e41cc6438>
#~ Abbreviation ~
# For large sized arrays, multiarrays of type float32 are more efficient.
# In future, float input/output multiarrays will be produced by default by the converter.
# Please use, either the flag 'use_float_arraytype' during the call to convert or
# the utility 'coremltools.utils.convert_double_to_float_multiarray_type(spec)', post-conversion.

Save the converted Core ML model.


coreml_model_path = 'yolo.mlmodel'

** 5. Check the display of inference results **

The conversion to CoreML worked, but when I copied it to my Xcode project and tried to infer it in the Vision Framework, it failed.

This is because there are three outputs of YOLOv3, which have shapes of 1x1x255x13x13, 1x1x255x26x26, 1x1x255x52x52, but they cannot be interpreted by Vision Framework as they are. You need to decode the output.

About the output of YOLO v3, this blog was easy to understand. Model structure of general object recognition YOLO v3

Decoding seems to be difficult if you make it yourself, so this time I will use this project.

Ma-Dan/YOLOv3-CoreML https://github.com/Ma-Dan/YOLOv3-CoreML

This project assumes that the CoreML output will be 255x13x13, 255x26x26, 255x52x52. You need to reshape the output to this shape.

** 6. Reshape the output **

Reshape the output of the model as follows:

1x1x255x13x13 → 255x13x13 1x1x255x26x26 → 255x13x26 1x1x255x52x52 → 255x13x52

To do this, you need to add a layer to reshape with Core ML Tools. You can read more about how to edit layers in your Core ML model with Core ML Tools here.

How to edit layers in Core ML Tools https://qiita.com/TokyoYoshida/items/7aa67dcea059a767b4f2

It is a layer to reshape, but at first it reduces the dimension [because there is add_squeeze](https://apple.github.io/coremltools/generated/coremltools.models.neural_network.builder.html#coremltools.models.neural_network I tried (.builder.NeuralNetworkBuilder.add_squeeze) but for some reason it didn't work.

There was also add_reshape. Then the dimension of the first 1x1 part remained unreduced.

As a result of various investigations, there is add_reshape_static. I was able to reshape it well using this.

Add it as follows.


from coremltools.models.neural_network import datatypes

builder.add_reshape_static(name='Reshape1', input_name='grid1', output_name='output1', output_shape=(255,13,13))
builder.add_reshape_static(name='Reshape2', input_name='grid2', output_name='output2', output_shape=(255,26,26))
builder.add_reshape_static(name='Reshape3', input_name='grid3', output_name='output3', output_shape=(255,52,52))

Then specify the shape of the output for the entire model.


builder.spec.description.output[0].name = "output1"
builder.spec.description.output[0].type.multiArrayType.shape[0] = 255

builder.spec.description.output[1].name = "output2"
builder.spec.description.output[1].type.multiArrayType.shape[0] = 255

builder.spec.description.output[2].name = "output3"
builder.spec.description.output[2].type.multiArrayType.shape[0] = 255

Finally save the model.


mlmodel_modified = coremltools.models.MLModel(spec)

** 7. Display on the app **

All you have to do now is drag and drop the Core ML model into your YOLO v3-Core ML project and run it.

If you try to display the model from Xcode, you can see that it is recognized correctly.

This is the execution result.

You can recognize it properly.


Note regularly publishes about iOS development, so please follow us. https://note.com/tokyoyoshida

It is also posted on Twitter. https://twitter.com/jugemjugemjugem

Recommended Posts

[Core ML] How to convert YOLO v3 to Core ML
How to edit layers in Core ML Tools
How to convert Java radix
Convert Map <K, V1> to Map <K, V2> (Convert Map Value)
How to convert erb file to haml
How to convert LocalDate and Timestamp
[Core ML] Convert Cycle GAN to Core ML and run it on iOS
How to convert java.util.Date, java.sql.Date, LocalDate, ZonedDateTime
[Ruby] How to convert from lowercase to uppercase and from uppercase to lowercase
[Ruby] How to convert CSV file to Yaml (Yml)
[Android] How to convert a character string to resourceId
How to install Titan2D (v4.2.0) in virtual environment
How to deploy
How to convert a solidity contract to a Java contract class
Ruby How to convert between uppercase and lowercase
[Rails] How to convert UC time display to Japanese time display
How to convert A to a and a to A using AND and OR in Java
How to convert a file to a byte array in Java
How to write a core mod in Minecraft Forge 1.15.2
How to develop OpenSPIFe
How to call AmazonSQSAsync
How to use Map
How to write Rails
How to use rbenv
How to use letter_opener_web
How to use with_option
How to use fields_for
How to use map
How to use collection_select
How to adapt Bootstrap
How to use Twitter4J
How to use active_hash! !!
How to install Docker
How to use MapStruct
How to use hidden_field_tag
How to use TreeSet
How to write dockerfile
How to uninstall Rails
How to install docker-machine
[How to use label]
How to make shaded-jar
How to write docker-compose
How to use identity
How to use hashes
How to write Mockito
How to create docker-compose
How to use JUnit 5
How to install MySQL
How to write migrationfile
How to build android-midi-lib
How to use Dozer.mapper
How to use Gradle
How to use org.immutables
How to use java.util.stream.Collector
How to use VisualVM
How to use Map
How to install ngrok
How to type backslash \
How to concatenate strings
[Java] How to convert a character string from String type to byte type
Think about how to divide MVC into M and V