We are a group of volunteers aiming to start up with ** embedded SW optimization technology ** as the core competence to bring out the HW performance of ** multi-core CPU ** and ** SIMD architecture **.
I am challenging how much Deep Learning can be accelerated with only ** CPU ** of Raspberry Pi 3/4.
In the past, I was targeting frameworks such as Chainer and darknet, but now I am trying to speed up the ONNX runtime.
The results at this time are as follows.
@onnxruntime on RPi4(CPU Only)
— Project-RAIZIN (@ProjectRaizin) September 8, 2020
MobileNetV3(Image clasification)
MobileNetV2-SSDLite(Image detection)
Original vs. Accelerated#RaspberryPi #Python #DeepLearninghttps://t.co/wvBLn9Tfes
Originally, Microsoft and Facebook are promoting the project, so it is difficult to speed up several times, but I managed to double the performance by tuning im2col, gem, Activation function, etc.
In addition, we have released demo videos of various models. Youtube channel
The acceleration approach is common as shown below.
I think that it is a characteristic of us that there is no other attitude to squeeze a general item like ** a little faster ** a little faster ** while taking a profile.
This time, I have only introduced the results, but I would like to summarize the technical data for each item as a memorandum and publish it as needed.
Recommended Posts