Try Deep Learning with FPGA-Select Cucumbers

Introduction

The other day, in the article Try Deep Learning with FPGA, PYNQ and I wrote about BNN-PYNQ. In the previous article, I introduced a relatively inexpensive FPGA board called PYNQ-Z1 Board and even ran a demo (Cifar10) prepared in advance. Therefore, this time, we will develop the demo prepared in advance and select the cucumbers.

Prior explanation

To customize BNN-PYNQ

As I wrote in the previous article, Deep Learning consists largely of learning and reasoning. In BNN-PYNQ, only inference is implemented (learning must be done on CPU / GPU). Therefore, customizing BNN-PYNQ means changing the network structure and parameters of inference as it is learned.

Taking the previous Cifar10 as an example, in BNN-PYNQ, Deep Learning processing on FPGA is performed from the application on Jupyter according to the following flow. Last time, there was a CPU / FPGA speed comparison result, but that was realized by switching which shared library (python_hw / sw) to load in # 4 below.

# File Overview Custom method
1 Cifar10.ipynb It is an application. Last time it was a Jupyter file to run the demo.
2 bnn.py BNN-A library for running PYNQ in Python.
3 X-X-thres.bin
X-X-weights.bin
classes.txt
This is a parameter file. CPU/BNN the result of learning with GPU-It is used to capture with PYNQ. BinaryNets for Pynq - Training Networks
4 python_sw-cnv-pynq.so A shared library for running Deep Learning on the CPU. make-sw.sh python_sw
python_hw-cnv-pynq.so A shared library for running Deep Learning on FPGAs. make-sw.sh python_hw
5 cnv-pynq-pynq.bit A bitstream file for performing processing on the FPGA. When you switch the overlay, this file will be switched and read. make-hw.sh

This time, I will customize BNN-PYNQ, but since there is a hurdle to suddenly rebuild the network structure, I would like to change the parameters to be read while keeping the same network structure as Cifar10.

Cucumber selection

Since it became a hot topic for a while, many of you may know it, but it is a problem to classify the grades into 9 types based on the image of cucumber. Sorting "cucumbers" by deep learning with TensorFlow

The data required for learning is published on GitHub, so we will use it. There are two published on GitHub, ProtoType-1, 2, but this time we will use ProtoType-1, which has a dataset format close to Cifar 10. GitHub - workpiles/CUCUMBER-9

prototype-1.jpeg

  • 2L〜2S
    Good product. Good color, relatively straight and not biased in thickness. It is sorted into 5 stages from 2L to 2S according to the size.
  • BL〜BS
    B product. Those with bad color, slightly bent, or uneven thickness. It is sorted into 3 stages from L to S according to the size.
  • C
    C product. Bad shape.

Looking at some blogs, it seems that the correct answer rate is around 80% without any ingenuity. This time, I'm very grateful because I'm not changing the network structure.

Implementation content

Learning (implemented on CPU / GPU instance)

Create the parameter data to load on the FPGA. As mentioned in the table above, follow the procedure published on GitHub. BinaryNets for Pynq - Training Networks

Note that this parameter file must be created on the CPU / GPU. This time, I set up a GPU instance (NC6 Ubuntu 16.04) on Azure.

Build GPU environment

Install Nvidia Drivers, CUDA, cuDNN.

Install Nvidia Drivers

$ wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
$ sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
$ sudo apt-get update

CUDA installation

$ sudo apt-get install cuda -y

cuDNN installation

$ sudo dpkg -i libcudnn5_5.1.10-1+cuda8.0_amd64.deb libcudnn5-dev_5.1.10-1+cuda8.0_amd64.deb

PATH setting

$ sudo sh -c "echo 'CUDA_HOME=/usr/local/cuda' >> /etc/profile.d/cuda.sh"
$ sudo sh -c "echo 'export LD_LIBRARY_PATH=\${LD_LIBRARY_PATH}:\${CUDA_HOME}/lib64' >> /etc/profile.d/cuda.sh"
$ sudo sh -c "echo 'export LIBRARY_PATH=\${LIBRARY_PATH}:\${CUDA_HOME}/lib64' >> /etc/profile.d/cuda.sh"
$ sudo sh -c "echo 'export C_INCLUDE_PATH=\${C_INCLUDE_PATH}:\${CUDA_HOME}/include' >> /etc/profile.d/cuda.sh"
$ sudo sh -c "echo 'export CXX_INCLUDE_PATH=\${CXX_INCLUDE_PATH}:\${CUDA_HOME}/include' >> /etc/profile.d/cuda.sh"
$ sudo sh -c "echo 'export PATH=\${PATH}:\${CUDA_HOME}/bin' >> /etc/profile.d/cuda.sh"

Reboot the instance

$ sudo reboot

Confirmation of installation

$ nvidia-smi
Thu Mar 30 07:42:52 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 8CFC:00:00.0     Off |                    0 |
| N/A   38C    P0    75W / 149W |      0MiB / 11439MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Python installation

Install the Python libraries (Theano, Lasagne, Numpy, Pylearn2). I also have pyenv installed first to use Python 2.7.

Install pyenv & python 2.7

$ sudo apt-get install git gcc make openssl libssl-dev libbz2-dev libreadline-dev libsqlite3-dev
$ git clone https://github.com/yyuu/pyenv.git ~/.pyenv
$ vi .bashrc
export PYENV_ROOT="$HOME/.pyenv"
export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init -)"
$ source .bashrc
$ env PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install 2.7.13
$ pyenv global 2.7.13

Install Python libraries (Theano, Lasagne, Numpy, Pylearn 2)

$ pip install --user git+https://github.com/Theano/[email protected]
$ pip install --user https://github.com/Lasagne/Lasagne/archive/master.zip
$ echo "[global]" >> ~/.theanorc
$ echo "floatX = float32" >> ~/.theanorc
$ echo "device = gpu" >> ~/.theanorc
$ echo "openmp = True" >> ~/.theanorc
$ echo "openmp_elemwise_minsize = 200000" >> ~/.theanorc
$ echo "" >> ~/.theanorc
$ echo "[nvcc]" >> ~/.theanorc
$ echo "fastmath = True" >> ~/.theanorc
$ echo "" >> ~/.theanorc
$ echo "[blas]" >> ~/.theanorc
$ echo "ldflags = -lopenblas" >> ~/.theanorc
$ git clone https://github.com/lisa-lab/pylearn2
$ cd pylearn2/
$ python setup.py develop --user

Data set preparation

Prepare the dataset to train. This time, I will use the image data of cucumber from GitHub.

$ git clone https://github.com/workpiles/CUCUMBER-9.git
$ cd CUCUMBER-9/prototype_1/
$ tar -zxvf cucumber-9-python.tar.gz

Program preparation

We will make a small change to the Xilinx program to change the dataset that the training loads. The main changes are the following two points.

Get the program from BNN-PYNQ

$ git clone https://github.com/Xilinx/BNN-PYNQ.git
$ cd BNN-PYNQ/bnn/src/training/

Change the program to be executed when learning Create cucumber9.py that reads the image data of the cucumber and executes the learning.

$ cp cifar10.py cucumber9.py
$ vi cucumber9.py

Binary data conversion program changes BNN-PYNQ handles binarized data. Therefore, it is necessary to convert the real parameter data to binary. Create cucumber9-gen-binary-weights.py that learns the image data of cucumber and converts the resulting parameter data to binary.

$ cp cifar10-gen-binary-weights.py cucumber9-gen-binary-weights.py
$ vi cucumber9-gen-binary-weights.py

Execution of learning

Now that you have the environment, data, and program ready to learn, run the program.

$ pwd /home/ubuntu/BNN-PYNQ/bnn/src/training
$ python cucumber9.py
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release.  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5110)
/home/ubuntu/.local/lib/python2.7/site-packages/theano/tensor/basic.py:2144: UserWarning: theano.tensor.round() changed its default from `half_away_from_zero` to `half_to_even` to have the same default as NumPy. Use the Theano flag `warn.round=False` to disable this warning.
  "theano.tensor.round() changed its default from"
batch_size = 50
alpha = 0.1
epsilon = 0.0001
W_LR_scale = Glorot
num_epochs = 500
LR_start = 0.001
LR_fin = 3e-07
LR_decay = 0.983907435305
save_path = cucumber9_parameters.npz
train_set_size = 2475
shuffle_parts = 1
Loading CUCUMBER9 dataset...
Building the CNN...
W_LR_scale = 20.0499
H = 1
W_LR_scale = 27.7128
H = 1
W_LR_scale = 33.9411
H = 1
W_LR_scale = 39.1918
H = 1
W_LR_scale = 48.0
H = 1
W_LR_scale = 55.4256
H = 1
W_LR_scale = 22.6274
H = 1
W_LR_scale = 26.1279
H = 1
W_LR_scale = 18.6369
H = 1
Training...
Epoch 1 of 500 took 6.08435511589s
  LR:                            0.001
  training loss:                 1.48512187053
  validation loss:               2.05507221487
  validation error rate:         61.1111117734%
  best epoch:                    1
  best validation error rate:    61.1111117734%
  test loss:                     2.05507221487
  test error rate:               61.1111117734%

…

Epoch 500 of 500 took 5.53324913979s
  LR:                            3.04906731299e-07
  training loss:                 0.0024273797482
  validation loss:               0.132337698506
  validation error rate:         14.2222222355%
  best epoch:                    205
  best validation error rate:    11.9999999387%
  test loss:                     0.124302371922
  test error rate:               11.9999999387%

After a while, the learning will be completed and the parameter file will be completed.

$ ls
cucumber9_parameters.npz

Binary parameter data

Converts real parameter data to binary.

$ python cucumber9-gen-binary-weights.py
cucumber9_parameters.npz

Binary parameter data is completed. Load this file with PYNQ.

$ ls binparam-cnv-pynq/
0-0-thres.bin     0-3-weights.bin   1-12-thres.bin    1-20-weights.bin  1-2-thres.bin     1-9-weights.bin   2-3-thres.bin     3-11-weights.bin  3-6-thres.bin    6-0-weights.bin
0-0-weights.bin   0-4-thres.bin     1-12-weights.bin  1-21-thres.bin    1-2-weights.bin   2-0-thres.bin     2-3-weights.bin   3-12-thres.bin    3-6-weights.bin  7-0-thres.bin
0-10-thres.bin    0-4-weights.bin   1-13-thres.bin    1-21-weights.bin  1-30-thres.bin    2-0-weights.bin   2-4-thres.bin     3-12-weights.bin  3-7-thres.bin    7-0-weights.bin
0-10-weights.bin  0-5-thres.bin     1-13-weights.bin  1-22-thres.bin    1-30-weights.bin  2-10-thres.bin    2-4-weights.bin   3-13-thres.bin    3-7-weights.bin  8-0-thres.bin
0-11-thres.bin    0-5-weights.bin   1-14-thres.bin    1-22-weights.bin  1-31-thres.bin    2-10-weights.bin  2-5-thres.bin     3-13-weights.bin  3-8-thres.bin    8-0-weights.bin
0-11-weights.bin  0-6-thres.bin     1-14-weights.bin  1-23-thres.bin    1-31-weights.bin  2-11-thres.bin    2-5-weights.bin   3-14-thres.bin    3-8-weights.bin  8-1-thres.bin
0-12-thres.bin    0-6-weights.bin   1-15-thres.bin    1-23-weights.bin  1-3-thres.bin     2-11-weights.bin  2-6-thres.bin     3-14-weights.bin  3-9-thres.bin    8-1-weights.bin
0-12-weights.bin  0-7-thres.bin     1-15-weights.bin  1-24-thres.bin    1-3-weights.bin   2-12-thres.bin    2-6-weights.bin   3-15-thres.bin    3-9-weights.bin  8-2-thres.bin
0-13-thres.bin    0-7-weights.bin   1-16-thres.bin    1-24-weights.bin  1-4-thres.bin     2-12-weights.bin  2-7-thres.bin     3-15-weights.bin  4-0-thres.bin    8-2-weights.bin
0-13-weights.bin  0-8-thres.bin     1-16-weights.bin  1-25-thres.bin    1-4-weights.bin   2-13-thres.bin    2-7-weights.bin   3-1-thres.bin     4-0-weights.bin  8-3-thres.bin
0-14-thres.bin    0-8-weights.bin   1-17-thres.bin    1-25-weights.bin  1-5-thres.bin     2-13-weights.bin  2-8-thres.bin     3-1-weights.bin   4-1-thres.bin    8-3-weights.bin
0-14-weights.bin  0-9-thres.bin     1-17-weights.bin  1-26-thres.bin    1-5-weights.bin   2-14-thres.bin    2-8-weights.bin   3-2-thres.bin     4-1-weights.bin  classes.txt
0-15-thres.bin    0-9-weights.bin   1-18-thres.bin    1-26-weights.bin  1-6-thres.bin     2-14-weights.bin  2-9-thres.bin     3-2-weights.bin   4-2-thres.bin
0-15-weights.bin  1-0-thres.bin     1-18-weights.bin  1-27-thres.bin    1-6-weights.bin   2-15-thres.bin    2-9-weights.bin   3-3-thres.bin     4-2-weights.bin
0-1-thres.bin     1-0-weights.bin   1-19-thres.bin    1-27-weights.bin  1-7-thres.bin     2-15-weights.bin  3-0-thres.bin     3-3-weights.bin   4-3-thres.bin
0-1-weights.bin   1-10-thres.bin    1-19-weights.bin  1-28-thres.bin    1-7-weights.bin   2-1-thres.bin     3-0-weights.bin   3-4-thres.bin     4-3-weights.bin
0-2-thres.bin     1-10-weights.bin  1-1-thres.bin     1-28-weights.bin  1-8-thres.bin     2-1-weights.bin   3-10-thres.bin    3-4-weights.bin   5-0-thres.bin
0-2-weights.bin   1-11-thres.bin    1-1-weights.bin   1-29-thres.bin    1-8-weights.bin   2-2-thres.bin     3-10-weights.bin  3-5-thres.bin     5-0-weights.bin
0-3-thres.bin     1-11-weights.bin  1-20-thres.bin    1-29-weights.bin  1-9-thres.bin     2-2-weights.bin   3-11-thres.bin    3-5-weights.bin   6-0-thres.bin

Inference (implemented by PYNQ)

Arrangement of parameter data

Transfer the parameter data created earlier to PYNQ.

$ sudo mkdir /opt/python3.6/lib/python3.6/site-packages/bnn/params/cucumber9
$ sudo ls /opt/python3.6/lib/python3.6/site-packages/bnn/params/cucumber9/
0-0-thres.bin     0-3-weights.bin   1-12-thres.bin    1-20-weights.bin  1-2-thres.bin     1-9-weights.bin   2-3-thres.bin     3-11-weights.bin  3-6-thres.bin    6-0-weights.bin
0-0-weights.bin   0-4-thres.bin     1-12-weights.bin  1-21-thres.bin    1-2-weights.bin   2-0-thres.bin     2-3-weights.bin   3-12-thres.bin    3-6-weights.bin  7-0-thres.bin
0-10-thres.bin    0-4-weights.bin   1-13-thres.bin    1-21-weights.bin  1-30-thres.bin    2-0-weights.bin   2-4-thres.bin     3-12-weights.bin  3-7-thres.bin    7-0-weights.bin
0-10-weights.bin  0-5-thres.bin     1-13-weights.bin  1-22-thres.bin    1-30-weights.bin  2-10-thres.bin    2-4-weights.bin   3-13-thres.bin    3-7-weights.bin  8-0-thres.bin
0-11-thres.bin    0-5-weights.bin   1-14-thres.bin    1-22-weights.bin  1-31-thres.bin    2-10-weights.bin  2-5-thres.bin     3-13-weights.bin  3-8-thres.bin    8-0-weights.bin
0-11-weights.bin  0-6-thres.bin     1-14-weights.bin  1-23-thres.bin    1-31-weights.bin  2-11-thres.bin    2-5-weights.bin   3-14-thres.bin    3-8-weights.bin  8-1-thres.bin
0-12-thres.bin    0-6-weights.bin   1-15-thres.bin    1-23-weights.bin  1-3-thres.bin     2-11-weights.bin  2-6-thres.bin     3-14-weights.bin  3-9-thres.bin    8-1-weights.bin
0-12-weights.bin  0-7-thres.bin     1-15-weights.bin  1-24-thres.bin    1-3-weights.bin   2-12-thres.bin    2-6-weights.bin   3-15-thres.bin    3-9-weights.bin  8-2-thres.bin
0-13-thres.bin    0-7-weights.bin   1-16-thres.bin    1-24-weights.bin  1-4-thres.bin     2-12-weights.bin  2-7-thres.bin     3-15-weights.bin  4-0-thres.bin    8-2-weights.bin
0-13-weights.bin  0-8-thres.bin     1-16-weights.bin  1-25-thres.bin    1-4-weights.bin   2-13-thres.bin    2-7-weights.bin   3-1-thres.bin     4-0-weights.bin  8-3-thres.bin
0-14-thres.bin    0-8-weights.bin   1-17-thres.bin    1-25-weights.bin  1-5-thres.bin     2-13-weights.bin  2-8-thres.bin     3-1-weights.bin   4-1-thres.bin    8-3-weights.bin
0-14-weights.bin  0-9-thres.bin     1-17-weights.bin  1-26-thres.bin    1-5-weights.bin   2-14-thres.bin    2-8-weights.bin   3-2-thres.bin     4-1-weights.bin  classes.txt
0-15-thres.bin    0-9-weights.bin   1-18-thres.bin    1-26-weights.bin  1-6-thres.bin     2-14-weights.bin  2-9-thres.bin     3-2-weights.bin   4-2-thres.bin
0-15-weights.bin  1-0-thres.bin     1-18-weights.bin  1-27-thres.bin    1-6-weights.bin   2-15-thres.bin    2-9-weights.bin   3-3-thres.bin     4-2-weights.bin
0-1-thres.bin     1-0-weights.bin   1-19-thres.bin    1-27-weights.bin  1-7-thres.bin     2-15-weights.bin  3-0-thres.bin     3-3-weights.bin   4-3-thres.bin
0-1-weights.bin   1-10-thres.bin    1-19-weights.bin  1-28-thres.bin    1-7-weights.bin   2-1-thres.bin     3-0-weights.bin   3-4-thres.bin     4-3-weights.bin
0-2-thres.bin     1-10-weights.bin  1-1-thres.bin     1-28-weights.bin  1-8-thres.bin     2-1-weights.bin   3-10-thres.bin    3-4-weights.bin   5-0-thres.bin
0-2-weights.bin   1-11-thres.bin    1-1-weights.bin   1-29-thres.bin    1-8-weights.bin   2-2-thres.bin     3-10-weights.bin  3-5-thres.bin     5-0-weights.bin
0-3-thres.bin     1-11-weights.bin  1-20-thres.bin    1-29-weights.bin  1-9-thres.bin     2-2-weights.bin   3-11-thres.bin    3-5-weights.bin   6-0-thres.bin

Placement of test data

Download the test data used for inference to PYNQ.

$ git clone https://github.com/workpiles/CUCUMBER-9.git
$ cd CUCUMBER-9/prototype_1/
$ tar -zxvf cucumber-9-python.tar.gz

Executing inference

Let's run it from Jupyter as in the previous demo. When executing CUCUMBER9, specify to read cucumber9 as a parameter as shown below.

classifier = bnn.CnvClassifier('cucumber9')

The execution result is as shown in the capture below.

screencapture-192-168-0-15-9090-nbconvert-html-bnn-Cucumber9-ipynb-1492315054198-min.png

You can classify it correctly! The execution time is as follows. Although the CPU of PYNQ is poor, the result of FPGA is about 360 times faster.

FPGA


Inference took 2240.00 microseconds
Classification rate: 446.43 images per second

CPU


Inference took 816809.00 microseconds
Classification rate: 1.22 images per second

reference

When writing the program, I referred to the following blog.

in conclusion

This time, PYNQ was powered by a mobile battery. I was surprised at how much power was saved.

Recommended Posts

Try Deep Learning with FPGA-Select Cucumbers
Try deep learning with TensorFlow
Try Deep Learning with FPGA
Try deep learning with TensorFlow Part 2
Try Bitcoin Price Forecasting with Deep Learning
Try with Chainer Deep Q Learning --Launch
Try deep learning of genomics with Kipoi
Deep Kernel Learning with Pyro
Try machine learning with Kaggle
Generate Pokemon with Deep Learning
Cat breed identification with deep learning
Reinforcement learning 13 Try Mountain_car with ChainerRL.
Try to build a deep learning / neural network with scratch
[Evangelion] Try to automatically generate Asuka-like lines with Deep Learning
Make ASCII art with deep learning
Solve three-dimensional PDEs with deep learning.
Try machine learning with scikit-learn SVM
Check squat forms with deep learning
Categorize news articles with deep learning
Forecasting Snack Sales with Deep Learning
Deep Learning
Try Common Representation Learning with chainer
Make people smile with Deep Learning
Introduction to Deep Learning (2) --Try your own nonlinear regression with Chainer-
Classify anime faces with deep learning with Chainer
Deep learning / Deep learning from scratch 2-Try moving GRU
Sentiment analysis of tweets with deep learning
Reinforcement learning 11 Try OpenAI acrobot with ChainerRL.
Deep Learning Memorandum
Start Deep learning
Python Deep Learning
Deep learning × Python
The story of doing deep learning with TPU
99.78% accuracy with deep learning by recognizing handwritten hiragana
Try scraping with Python.
First Deep Learning ~ Struggle ~
Learning Python with ChemTHEATER 03
"Object-oriented" learning with python
A story about predicting exchange rates with Deep Learning
Learning Python with ChemTHEATER 05-1
Python: Deep Learning Practices
Deep learning / activation functions
Deep Learning from scratch
Deep learning image analysis starting with Kaggle and Keras
Deep learning 1 Practice of deep learning
Try to predict forex (FX) with non-deep machine learning
Deep learning / cross entropy
First Deep Learning ~ Preparation ~
First Deep Learning ~ Solution ~
[AI] Deep Metric Learning
Learning Python with ChemTHEATER 02
I tried deep learning
Learning Python with ChemTHEATER 01
Try SNN with BindsNET
Extract music features with Deep Learning and predict tags
Classify anime faces by sequel / deep learning with Keras
Python: Deep Learning Tuning
Deep learning large-scale technology
Try regression with TensorFlow
Now, let's try face recognition with Chainer (learning phase)
Deep learning / softmax function