I bought Movidius, a USB accelerator that enables Deep Learning released by Intel to be used on small and lightweight terminals. CVPR17, which was pre-sold only locally, is now on sale from RS. This time, I will try to touch the python library while executing the sample.
--Convert Caffe's model to work on Movidius --Control Movidius from RasPi via Python --AlexNet is 97msec, GoogleNet is 113msec and can be calculated per image
Since the RasPi side only executes the converted model, install only the necessary API.
Here, launch the Docker environment on Mac, install the SDK there, and convert the model. I will make it referring to the article of here. First, create a Docker environment in an appropriate directory.
mkdir Docker && cd Docker
git clone https://github.com/peisuke/MovidiusNCS-setup.git
cd MovidiusNCS-setup
docker build -t movidius .
... It takes time to build the environment. After completing the settings, start docker. Set up a shared folder and start it to exchange the converted model file with the Mac side.
docker run -it --rm -v [The directory on the Mac that you want to share with the Docker environment]:/home/ubuntu/data movidius:latest /bin/bash
Download the model of Caffe to be converted. In the case of AlexNet, it is as follows.
mkdir -p data/AlexNet && cd data/AlexNet
wget http://dl.caffe.berkeleyvision.org/bvlc_alexnet.caffemodel
wget https://raw.githubusercontent.com/BVLC/caffe/master/models/bvlc_alexnet/deploy.prototxt
In the sample, it is a process for one image, so change the number of batches in the network from 10 to 1.
data/AlexNet/deploy.prototxt
input_param { shape: { dim: 1 dim: 3 dim: 224 dim: 224 } }
Now transform Caffe's model. As we will compare later, adding the (-s 12) option will increase the execution speed by 3 to 4 times.
cd data/AlexNet
python3 ../../bin/mvNCCompile.pyc ./deploy.prototxt (-s 12)
-w ./bvlc_alexnet.caffemodel -o graph
From here on, we'll be working on the RasPi. I will omit the general installation method of Raspbian. Since the RasPi side only executes the converted model, here we will install only the necessary things such as API.
wget https://ncs-forum-uploads.s3.amazonaws.com/ncsdk/MvNC_SDK_01_07_07/MvNC_SDK_1.07.07.tgz
tar xvf MvNC_SDK_1.07.07.tgz
tar xvf MvNC_API-1.07.07.tgz
cd ncapi/redist/pi_jessie
sudo dpkg -i *
It seems that the other file MvNC_Toolkit-1.07.06.tgz that you get when you unzip MvNC_SDK_1.07.07.tgz here is not needed for execution. (Used only for model transformation)
Move the graph file converted earlier to the RasPi side with scp etc.
scp ./graph pi@***.***.**.**:~/***/ncapi/network/AlexNet/
The Python sample tries the following: ncapi/py_examples/classification_example.py
However, if you run it as it is, you will get an error that some files are missing.
$ python3 classification_example.py 2
Found stale device, resetting
Device 0 Address: 1.4 - VID/PID 03e7:2150
Starting wait for connect with 2000ms timeout
Found Address: 1.4 - VID/PID 03e7:2150
Found EP 0x81 : max packet size is 512 bytes
Found EP 0x01 : max packet size is 512 bytes
Found and opened device
Performing bulk write of 825136 bytes...
Successfully sent 825136 bytes of data in 47.187813 ms (16.676149 MB/s)
Boot successful, device address 1.4
Found Address: 1.4 - VID/PID 040e:f63b
done
Booted 1.4 -> VSC
Traceback (most recent call last):
File "classification_example.py", line 52, in <module>
ilsvrc_mean = numpy.load('../mean/ilsvrc12/ilsvrc_2012_mean.npy').mean(1).mean(1) #loading the mean file
File "/usr/local/lib/python3.4/dist-packages/numpy/lib/npyio.py", line 370, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '../mean/ilsvrc12/ilsvrc_2012_mean.npy'
When I set up the Toolkit that I did not install earlier, it downloads the appropriate ILSVRC label file, but since I skipped it, I will download only the necessary ones to the location where the error appears.
wget https://github.com/BVLC/caffe/blob/master/python/caffe/imagenet/ilsvrc_2012_mean.npy
wget https://github.com/HoldenCaulfieldRye/caffe/blob/master/data/ilsvrc12/synset_words.txt
If successful, it will look like this:
$ python3 classification_example.py 1
Device 0 Address: 1.4 - VID/PID 03e7:2150
Starting wait for connect with 2000ms timeout
Found Address: 1.4 - VID/PID 03e7:2150
Found EP 0x81 : max packet size is 512 bytes
Found EP 0x01 : max packet size is 512 bytes
Found and opened device
Performing bulk write of 825136 bytes...
Successfully sent 825136 bytes of data in 47.039351 ms (16.728781 MB/s)
Boot successful, device address 1.4
Found Address: 1.4 - VID/PID 040e:f63b
done
Booted 1.4 -> VSC
------- predictions --------
prediction 1 is n02123045 tabby, tabby cat
prediction 2 is n02124075 Egyptian cat
prediction 3 is n02127052 lynx, catamount
prediction 4 is n02123394 Persian cat
prediction 5 is n02971356 carton
Python Module
Let's take a look at the contents of the Python sample. The parts related to the use of network files are as follows.
from mvnc import mvncapi as mvnc
import cv2
mvnc.SetGlobalOption(mvnc.GlobalOption.LOGLEVEL, 2)
devices = mvnc.EnumerateDevices() #Check the connected Movidius
if len(devices) == 0:
print('No devices found')
quit()
device = mvnc.Device(devices[0])
device.OpenDevice()
opt = device.GetDeviceOption(mvnc.DeviceOption.OPTIMISATIONLIST)
network_blob='../networks/AlexNet/graph' #Converted model file name
f = open(network_blob, mode='rb')
blob = f.read()
graph = device.AllocateGraph(blob) #Set the converted model in Movidius
graph.SetGraphOption(mvnc.GraphOption.ITERATIONS, 1)
iterations = graph.GetGraphOption(mvnc.GraphOption.ITERATIONS)
img = cv2.imread('***.jpg')
graph.LoadTensor(img.astype(numpy.float16), 'user object') #Store image data in input
output, userobj = graph.GetResult() #Forward calculation here
graph.DeallocateGraph() #End processing
device.CloseDevice()
Although the same network was done on a PC and Movidius with different options, the processing time for each image is as follows.
AlexNet (224x224 RGB) | GoogleNet (227x227 RGB) | |
---|---|---|
MacBookPro (CPU 2.7GHz Corei5) | 0.091s | 0.315s |
Pi3 + Movidius (-s 12 No options) | 0.287s | 0.574s |
Pi3 + Movidius (-s 12 option available) | 0.097s | 0.113s |
It's not as fast as the GPU, but it's faster than the Core i5. Since it is finally possible to calculate full-scale deep learning with RasPi Various usages are expected in the future.
Recommended Posts