Try to profile with ONNX Runtime

This is a continuation of Speeding up Deep Learning with CPU of Raspberry Pi 4.

In order to accelerate deep learning, it is necessary to investigate which processing consumes how much time and reduce the actual processing time. Therefore, first use the profile function of ONNX Runtime for profiling.

How to enable the profiling feature is described in the ONNX Official Tutorial (https://microsoft.github.io/onnxruntime/python/auto_examples/plot_profiling.html "").

`sample.py`


import onnxruntime

options = onnxruntime.SessionOptions()
options.enable_profiling = True          # <-Profile function enabled
session = onnxruntime.InferenceSession(path_to_model, options)

[Profile target]

prof_file = session.end_profiling()
print(prof_file)

Profile results are saved in JSON format. In addition, profile results can be visualized with the Tracing tool built into Chrome. (Enter ** chrome: // tracing / ** in the Chrome URL to launch the tool.)

This time, let's take a profile in the case of performing image classification with ** MobileNetV1 depth 1.0 224x224 **. Each model uses the model with ONNX Runrime Graph Optimization.

Profile result

The process is processed in the order of model loading, session initialization, and model execution. If you expand the model execution part, the Convolution process, which is a fusion of the Batch Normalization process, is executed multiple times. (Batch Normalization should be handled independently, but ONNX Runrime Graph Optimization As a result, it is integrated into the Convolution process.)

The table below summarizes the profile results.

item	processing time(ms)	Percentage(%)
All processing	157.19	-
Convolution	148.017	94.2
gemm	6.053	3.9
Other	3.12	1.9

It turns out that in order to reduce the overall processing time, we have to do something about Convolution processing. I would like to consider what kind of approach is necessary from the next time onwards.

Recommended Posts

Try to profile with ONNX Runtime

Try to factorial with recursion

Try to operate Facebook with Python

Try to output audio with M5STACK

Try to reproduce color film with Python

Try to predict cherry blossoms with xgboost

Try converting to tidy data with pandas

Quickly try to visualize datasets with pandas

First YDK to try with Cisco IOS-XE

Try to generate an image with aliasing

Try to make your own AWS-SDK with bash

Try to solve the fizzbuzz problem with Keras

Try to aggregate doujin music data with pandas

Try to solve the man-machine chart with Python

Try to extract Azure document DB document with pydocumentdb

Try to draw a life curve with python

Try to communicate with EV3 and PC! (MQTT)

How to try the friends-of-friends algorithm with pyfof

How to deal with run-time errors in subprocess.call

Try to make a "cryptanalysis" cipher with Python

Try to automatically generate Python documents with Sphinx

Try to make a dihedral group with Python

Try to make client FTP fastest with Pythonista

Try to detect fish with python + OpenCV2.4 (unfinished)

Try scraping with Python.

Try to solve the programming challenge book with python3

[First API] Try to get Qiita articles with Python

Convert 202003 to 2020-03 with pandas

Try to make a command standby tool with python

Try to dynamically create a Checkbutton with Python's Tkinter

Try to visualize the room with Raspberry Pi, part 1

Try to solve the internship assignment problem with Python

Try to predict forex (FX) with non-deep machine learning

Convert keras-yolo3 to onnx

Try to make RESTful API with MVC using Flask 1.0.2

Schema-driven development with Responder: Try to display Swagger UI

[GCP] Try a sample to authenticate users with Firebase

Try to get the contents of Word with Golang

Try to implement yolact

[Neo4J] ④ Try to handle the graph structure with Cypher

Try SNN with BindsNET

A sample to try Factorization Machines quickly with fastFM

Try to tamper with requests from iphone with Burp Suite

Try to automate pdf format report creation with Python

Try regression with TensorFlow

Try to specify the axis with PyTorch's Softmax function

Try to build a deep learning / neural network with scratch

Try to play with the uprobe that supports Systemtap directly

[Evangelion] Try to automatically generate Asuka-like lines with Deep Learning

[Automatic test] How to start making automatic test during runtime with Airtest

Try to display various information useful for debugging with python

When I try to push with heroku, it doesn't work

Try Fortran with VS Code up to debug settings. [Win10]

Try to link iTunes and Hue collection case with MQTT

Try to bring up a subwindow with PyQt5 and Python

Try to extract Azure SQL Server data table with pyodbc

Try to automate the operation of network devices with Python

Try to process Titanic data with preprocessing library DataLiner (Append)

Try to get data while port forwarding to RDS with anaconda.

Try to predict if tweets will burn with machine learning

It's Halloween so I'll try to hide it with Python