The other day, in the article Try Deep Learning with FPGA, PYNQ and I wrote about BNN-PYNQ. In the previous article, I introduced a relatively inexpensive FPGA board called PYNQ-Z1 Board and even ran a demo (Cifar10
) prepared in advance. Therefore, this time, we will develop the demo prepared in advance and select the cucumbers.
As I wrote in the previous article, Deep Learning consists largely of learning and reasoning. In BNN-PYNQ, only inference is implemented (learning must be done on CPU / GPU). Therefore, customizing BNN-PYNQ means changing the network structure and parameters of inference as it is learned.
Taking the previous Cifar10
as an example, in BNN-PYNQ, Deep Learning processing on FPGA is performed from the application on Jupyter according to the following flow. Last time, there was a CPU / FPGA speed comparison result, but that was realized by switching which shared library (python_hw / sw) to load in # 4 below.
# | File | Overview | Custom method |
---|---|---|---|
1 | Cifar10.ipynb | It is an application. Last time it was a Jupyter file to run the demo. | |
2 | bnn.py | BNN-A library for running PYNQ in Python. | |
3 | X-X-thres.bin X-X-weights.bin classes.txt |
This is a parameter file. CPU/BNN the result of learning with GPU-It is used to capture with PYNQ. | BinaryNets for Pynq - Training Networks |
4 | python_sw-cnv-pynq.so | A shared library for running Deep Learning on the CPU. | make-sw.sh |
python_hw-cnv-pynq.so | A shared library for running Deep Learning on FPGAs. | make-sw.sh |
|
5 | cnv-pynq-pynq.bit | A bitstream file for performing processing on the FPGA. When you switch the overlay, this file will be switched and read. | make-hw.sh |
This time, I will customize BNN-PYNQ, but since there is a hurdle to suddenly rebuild the network structure, I would like to change the parameters to be read while keeping the same network structure as Cifar10
.
Since it became a hot topic for a while, many of you may know it, but it is a problem to classify the grades into 9 types based on the image of cucumber. Sorting "cucumbers" by deep learning with TensorFlow
The data required for learning is published on GitHub, so we will use it. There are two published on GitHub, ProtoType-1, 2
, but this time we will use ProtoType-1
, which has a dataset format close to Cifar 10
.
GitHub - workpiles/CUCUMBER-9
- 2L〜2S
Good product. Good color, relatively straight and not biased in thickness. It is sorted into 5 stages from 2L to 2S according to the size.- BL〜BS
B product. Those with bad color, slightly bent, or uneven thickness. It is sorted into 3 stages from L to S according to the size.- C
C product. Bad shape.
Looking at some blogs, it seems that the correct answer rate is around 80% without any ingenuity. This time, I'm very grateful because I'm not changing the network structure.
Create the parameter data to load on the FPGA. As mentioned in the table above, follow the procedure published on GitHub. BinaryNets for Pynq - Training Networks
Note that this parameter file must be created on the CPU / GPU. This time, I set up a GPU instance (NC6 Ubuntu 16.04
) on Azure.
Install Nvidia Drivers, CUDA, cuDNN.
Install Nvidia Drivers
$ wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
$ sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
$ sudo apt-get update
CUDA installation
$ sudo apt-get install cuda -y
cuDNN installation
$ sudo dpkg -i libcudnn5_5.1.10-1+cuda8.0_amd64.deb libcudnn5-dev_5.1.10-1+cuda8.0_amd64.deb
PATH setting
$ sudo sh -c "echo 'CUDA_HOME=/usr/local/cuda' >> /etc/profile.d/cuda.sh"
$ sudo sh -c "echo 'export LD_LIBRARY_PATH=\${LD_LIBRARY_PATH}:\${CUDA_HOME}/lib64' >> /etc/profile.d/cuda.sh"
$ sudo sh -c "echo 'export LIBRARY_PATH=\${LIBRARY_PATH}:\${CUDA_HOME}/lib64' >> /etc/profile.d/cuda.sh"
$ sudo sh -c "echo 'export C_INCLUDE_PATH=\${C_INCLUDE_PATH}:\${CUDA_HOME}/include' >> /etc/profile.d/cuda.sh"
$ sudo sh -c "echo 'export CXX_INCLUDE_PATH=\${CXX_INCLUDE_PATH}:\${CUDA_HOME}/include' >> /etc/profile.d/cuda.sh"
$ sudo sh -c "echo 'export PATH=\${PATH}:\${CUDA_HOME}/bin' >> /etc/profile.d/cuda.sh"
Reboot the instance
$ sudo reboot
Confirmation of installation
$ nvidia-smi
Thu Mar 30 07:42:52 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39 Driver Version: 375.39 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 8CFC:00:00.0 Off | 0 |
| N/A 38C P0 75W / 149W | 0MiB / 11439MiB | 97% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Install the Python libraries (Theano, Lasagne, Numpy, Pylearn2
). I also have pyenv installed first to use Python 2.7.
Install pyenv & python 2.7
$ sudo apt-get install git gcc make openssl libssl-dev libbz2-dev libreadline-dev libsqlite3-dev
$ git clone https://github.com/yyuu/pyenv.git ~/.pyenv
$ vi .bashrc
export PYENV_ROOT="$HOME/.pyenv"
export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init -)"
$ source .bashrc
$ env PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install 2.7.13
$ pyenv global 2.7.13
Install Python libraries (Theano, Lasagne, Numpy, Pylearn 2
)
$ pip install --user git+https://github.com/Theano/[email protected]
$ pip install --user https://github.com/Lasagne/Lasagne/archive/master.zip
$ echo "[global]" >> ~/.theanorc
$ echo "floatX = float32" >> ~/.theanorc
$ echo "device = gpu" >> ~/.theanorc
$ echo "openmp = True" >> ~/.theanorc
$ echo "openmp_elemwise_minsize = 200000" >> ~/.theanorc
$ echo "" >> ~/.theanorc
$ echo "[nvcc]" >> ~/.theanorc
$ echo "fastmath = True" >> ~/.theanorc
$ echo "" >> ~/.theanorc
$ echo "[blas]" >> ~/.theanorc
$ echo "ldflags = -lopenblas" >> ~/.theanorc
$ git clone https://github.com/lisa-lab/pylearn2
$ cd pylearn2/
$ python setup.py develop --user
Prepare the dataset to train. This time, I will use the image data of cucumber from GitHub.
$ git clone https://github.com/workpiles/CUCUMBER-9.git
$ cd CUCUMBER-9/prototype_1/
$ tar -zxvf cucumber-9-python.tar.gz
We will make a small change to the Xilinx program to change the dataset that the training loads. The main changes are the following two points.
Get the program from BNN-PYNQ
$ git clone https://github.com/Xilinx/BNN-PYNQ.git
$ cd BNN-PYNQ/bnn/src/training/
Change the program to be executed when learning Create cucumber9.py that reads the image data of the cucumber and executes the learning.
$ cp cifar10.py cucumber9.py
$ vi cucumber9.py
Binary data conversion program changes BNN-PYNQ handles binarized data. Therefore, it is necessary to convert the real parameter data to binary. Create cucumber9-gen-binary-weights.py that learns the image data of cucumber and converts the resulting parameter data to binary.
$ cp cifar10-gen-binary-weights.py cucumber9-gen-binary-weights.py
$ vi cucumber9-gen-binary-weights.py
Now that you have the environment, data, and program ready to learn, run the program.
$ pwd /home/ubuntu/BNN-PYNQ/bnn/src/training
$ python cucumber9.py
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release. Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29
Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5110)
/home/ubuntu/.local/lib/python2.7/site-packages/theano/tensor/basic.py:2144: UserWarning: theano.tensor.round() changed its default from `half_away_from_zero` to `half_to_even` to have the same default as NumPy. Use the Theano flag `warn.round=False` to disable this warning.
"theano.tensor.round() changed its default from"
batch_size = 50
alpha = 0.1
epsilon = 0.0001
W_LR_scale = Glorot
num_epochs = 500
LR_start = 0.001
LR_fin = 3e-07
LR_decay = 0.983907435305
save_path = cucumber9_parameters.npz
train_set_size = 2475
shuffle_parts = 1
Loading CUCUMBER9 dataset...
Building the CNN...
W_LR_scale = 20.0499
H = 1
W_LR_scale = 27.7128
H = 1
W_LR_scale = 33.9411
H = 1
W_LR_scale = 39.1918
H = 1
W_LR_scale = 48.0
H = 1
W_LR_scale = 55.4256
H = 1
W_LR_scale = 22.6274
H = 1
W_LR_scale = 26.1279
H = 1
W_LR_scale = 18.6369
H = 1
Training...
Epoch 1 of 500 took 6.08435511589s
LR: 0.001
training loss: 1.48512187053
validation loss: 2.05507221487
validation error rate: 61.1111117734%
best epoch: 1
best validation error rate: 61.1111117734%
test loss: 2.05507221487
test error rate: 61.1111117734%
…
Epoch 500 of 500 took 5.53324913979s
LR: 3.04906731299e-07
training loss: 0.0024273797482
validation loss: 0.132337698506
validation error rate: 14.2222222355%
best epoch: 205
best validation error rate: 11.9999999387%
test loss: 0.124302371922
test error rate: 11.9999999387%
After a while, the learning will be completed and the parameter file will be completed.
$ ls
cucumber9_parameters.npz
Converts real parameter data to binary.
$ python cucumber9-gen-binary-weights.py
cucumber9_parameters.npz
Binary parameter data is completed. Load this file with PYNQ.
$ ls binparam-cnv-pynq/
0-0-thres.bin 0-3-weights.bin 1-12-thres.bin 1-20-weights.bin 1-2-thres.bin 1-9-weights.bin 2-3-thres.bin 3-11-weights.bin 3-6-thres.bin 6-0-weights.bin
0-0-weights.bin 0-4-thres.bin 1-12-weights.bin 1-21-thres.bin 1-2-weights.bin 2-0-thres.bin 2-3-weights.bin 3-12-thres.bin 3-6-weights.bin 7-0-thres.bin
0-10-thres.bin 0-4-weights.bin 1-13-thres.bin 1-21-weights.bin 1-30-thres.bin 2-0-weights.bin 2-4-thres.bin 3-12-weights.bin 3-7-thres.bin 7-0-weights.bin
0-10-weights.bin 0-5-thres.bin 1-13-weights.bin 1-22-thres.bin 1-30-weights.bin 2-10-thres.bin 2-4-weights.bin 3-13-thres.bin 3-7-weights.bin 8-0-thres.bin
0-11-thres.bin 0-5-weights.bin 1-14-thres.bin 1-22-weights.bin 1-31-thres.bin 2-10-weights.bin 2-5-thres.bin 3-13-weights.bin 3-8-thres.bin 8-0-weights.bin
0-11-weights.bin 0-6-thres.bin 1-14-weights.bin 1-23-thres.bin 1-31-weights.bin 2-11-thres.bin 2-5-weights.bin 3-14-thres.bin 3-8-weights.bin 8-1-thres.bin
0-12-thres.bin 0-6-weights.bin 1-15-thres.bin 1-23-weights.bin 1-3-thres.bin 2-11-weights.bin 2-6-thres.bin 3-14-weights.bin 3-9-thres.bin 8-1-weights.bin
0-12-weights.bin 0-7-thres.bin 1-15-weights.bin 1-24-thres.bin 1-3-weights.bin 2-12-thres.bin 2-6-weights.bin 3-15-thres.bin 3-9-weights.bin 8-2-thres.bin
0-13-thres.bin 0-7-weights.bin 1-16-thres.bin 1-24-weights.bin 1-4-thres.bin 2-12-weights.bin 2-7-thres.bin 3-15-weights.bin 4-0-thres.bin 8-2-weights.bin
0-13-weights.bin 0-8-thres.bin 1-16-weights.bin 1-25-thres.bin 1-4-weights.bin 2-13-thres.bin 2-7-weights.bin 3-1-thres.bin 4-0-weights.bin 8-3-thres.bin
0-14-thres.bin 0-8-weights.bin 1-17-thres.bin 1-25-weights.bin 1-5-thres.bin 2-13-weights.bin 2-8-thres.bin 3-1-weights.bin 4-1-thres.bin 8-3-weights.bin
0-14-weights.bin 0-9-thres.bin 1-17-weights.bin 1-26-thres.bin 1-5-weights.bin 2-14-thres.bin 2-8-weights.bin 3-2-thres.bin 4-1-weights.bin classes.txt
0-15-thres.bin 0-9-weights.bin 1-18-thres.bin 1-26-weights.bin 1-6-thres.bin 2-14-weights.bin 2-9-thres.bin 3-2-weights.bin 4-2-thres.bin
0-15-weights.bin 1-0-thres.bin 1-18-weights.bin 1-27-thres.bin 1-6-weights.bin 2-15-thres.bin 2-9-weights.bin 3-3-thres.bin 4-2-weights.bin
0-1-thres.bin 1-0-weights.bin 1-19-thres.bin 1-27-weights.bin 1-7-thres.bin 2-15-weights.bin 3-0-thres.bin 3-3-weights.bin 4-3-thres.bin
0-1-weights.bin 1-10-thres.bin 1-19-weights.bin 1-28-thres.bin 1-7-weights.bin 2-1-thres.bin 3-0-weights.bin 3-4-thres.bin 4-3-weights.bin
0-2-thres.bin 1-10-weights.bin 1-1-thres.bin 1-28-weights.bin 1-8-thres.bin 2-1-weights.bin 3-10-thres.bin 3-4-weights.bin 5-0-thres.bin
0-2-weights.bin 1-11-thres.bin 1-1-weights.bin 1-29-thres.bin 1-8-weights.bin 2-2-thres.bin 3-10-weights.bin 3-5-thres.bin 5-0-weights.bin
0-3-thres.bin 1-11-weights.bin 1-20-thres.bin 1-29-weights.bin 1-9-thres.bin 2-2-weights.bin 3-11-thres.bin 3-5-weights.bin 6-0-thres.bin
Transfer the parameter data created earlier to PYNQ.
$ sudo mkdir /opt/python3.6/lib/python3.6/site-packages/bnn/params/cucumber9
$ sudo ls /opt/python3.6/lib/python3.6/site-packages/bnn/params/cucumber9/
0-0-thres.bin 0-3-weights.bin 1-12-thres.bin 1-20-weights.bin 1-2-thres.bin 1-9-weights.bin 2-3-thres.bin 3-11-weights.bin 3-6-thres.bin 6-0-weights.bin
0-0-weights.bin 0-4-thres.bin 1-12-weights.bin 1-21-thres.bin 1-2-weights.bin 2-0-thres.bin 2-3-weights.bin 3-12-thres.bin 3-6-weights.bin 7-0-thres.bin
0-10-thres.bin 0-4-weights.bin 1-13-thres.bin 1-21-weights.bin 1-30-thres.bin 2-0-weights.bin 2-4-thres.bin 3-12-weights.bin 3-7-thres.bin 7-0-weights.bin
0-10-weights.bin 0-5-thres.bin 1-13-weights.bin 1-22-thres.bin 1-30-weights.bin 2-10-thres.bin 2-4-weights.bin 3-13-thres.bin 3-7-weights.bin 8-0-thres.bin
0-11-thres.bin 0-5-weights.bin 1-14-thres.bin 1-22-weights.bin 1-31-thres.bin 2-10-weights.bin 2-5-thres.bin 3-13-weights.bin 3-8-thres.bin 8-0-weights.bin
0-11-weights.bin 0-6-thres.bin 1-14-weights.bin 1-23-thres.bin 1-31-weights.bin 2-11-thres.bin 2-5-weights.bin 3-14-thres.bin 3-8-weights.bin 8-1-thres.bin
0-12-thres.bin 0-6-weights.bin 1-15-thres.bin 1-23-weights.bin 1-3-thres.bin 2-11-weights.bin 2-6-thres.bin 3-14-weights.bin 3-9-thres.bin 8-1-weights.bin
0-12-weights.bin 0-7-thres.bin 1-15-weights.bin 1-24-thres.bin 1-3-weights.bin 2-12-thres.bin 2-6-weights.bin 3-15-thres.bin 3-9-weights.bin 8-2-thres.bin
0-13-thres.bin 0-7-weights.bin 1-16-thres.bin 1-24-weights.bin 1-4-thres.bin 2-12-weights.bin 2-7-thres.bin 3-15-weights.bin 4-0-thres.bin 8-2-weights.bin
0-13-weights.bin 0-8-thres.bin 1-16-weights.bin 1-25-thres.bin 1-4-weights.bin 2-13-thres.bin 2-7-weights.bin 3-1-thres.bin 4-0-weights.bin 8-3-thres.bin
0-14-thres.bin 0-8-weights.bin 1-17-thres.bin 1-25-weights.bin 1-5-thres.bin 2-13-weights.bin 2-8-thres.bin 3-1-weights.bin 4-1-thres.bin 8-3-weights.bin
0-14-weights.bin 0-9-thres.bin 1-17-weights.bin 1-26-thres.bin 1-5-weights.bin 2-14-thres.bin 2-8-weights.bin 3-2-thres.bin 4-1-weights.bin classes.txt
0-15-thres.bin 0-9-weights.bin 1-18-thres.bin 1-26-weights.bin 1-6-thres.bin 2-14-weights.bin 2-9-thres.bin 3-2-weights.bin 4-2-thres.bin
0-15-weights.bin 1-0-thres.bin 1-18-weights.bin 1-27-thres.bin 1-6-weights.bin 2-15-thres.bin 2-9-weights.bin 3-3-thres.bin 4-2-weights.bin
0-1-thres.bin 1-0-weights.bin 1-19-thres.bin 1-27-weights.bin 1-7-thres.bin 2-15-weights.bin 3-0-thres.bin 3-3-weights.bin 4-3-thres.bin
0-1-weights.bin 1-10-thres.bin 1-19-weights.bin 1-28-thres.bin 1-7-weights.bin 2-1-thres.bin 3-0-weights.bin 3-4-thres.bin 4-3-weights.bin
0-2-thres.bin 1-10-weights.bin 1-1-thres.bin 1-28-weights.bin 1-8-thres.bin 2-1-weights.bin 3-10-thres.bin 3-4-weights.bin 5-0-thres.bin
0-2-weights.bin 1-11-thres.bin 1-1-weights.bin 1-29-thres.bin 1-8-weights.bin 2-2-thres.bin 3-10-weights.bin 3-5-thres.bin 5-0-weights.bin
0-3-thres.bin 1-11-weights.bin 1-20-thres.bin 1-29-weights.bin 1-9-thres.bin 2-2-weights.bin 3-11-thres.bin 3-5-weights.bin 6-0-thres.bin
Download the test data used for inference to PYNQ.
$ git clone https://github.com/workpiles/CUCUMBER-9.git
$ cd CUCUMBER-9/prototype_1/
$ tar -zxvf cucumber-9-python.tar.gz
Let's run it from Jupyter as in the previous demo. When executing CUCUMBER9, specify to read cucumber9
as a parameter as shown below.
classifier = bnn.CnvClassifier('cucumber9')
The execution result is as shown in the capture below.
You can classify it correctly! The execution time is as follows. Although the CPU of PYNQ is poor, the result of FPGA is about 360 times faster.
FPGA
Inference took 2240.00 microseconds
Classification rate: 446.43 images per second
CPU
Inference took 816809.00 microseconds
Classification rate: 1.22 images per second
When writing the program, I referred to the following blog.
This time, PYNQ was powered by a mobile battery. I was surprised at how much power was saved.
Recommended Posts