introduction

L'autre jour, dans l'article Try Deep Learning with FPGA, PYNQ et J'ai écrit sur BNN-PYNQ. Dans l'article précédent, j'ai présenté une carte FPGA relativement peu coûteuse appelée PYNQ-Z1 Board et même lancé une démo pré-préparée (Cifar10). Par conséquent, cette fois, nous développerons la démo préparée à l'avance et sélectionnerons les concombres.

Explication préalable

Pour personnaliser BNN-PYNQ

Comme je l'ai écrit dans l'article précédent, le Deep Learning consiste en grande partie à apprendre et à raisonner. Dans BNN-PYNQ, seule l'inférence est implémentée (l'apprentissage doit être effectué sur le CPU / GPU). Par conséquent, personnaliser BNN-PYNQ signifie changer la structure du réseau et les paramètres d'inférence au fur et à mesure de son apprentissage.

En prenant le précédent Cifar10 comme exemple, dans BNN-PYNQ, le traitement Deep Learning sur FPGA est effectué à partir de l'application sur Jupyter selon le flux suivant. La dernière fois, il y avait un résultat de comparaison de vitesse CPU / FPGA, mais cela a été réalisé en changeant la bibliothèque partagée (python_hw / sw) à charger dans # 4 ci-dessous.

#	Fichier	Aperçu	Méthode personnalisée
1	Cifar10.ipynb	C'est une application. La dernière fois, c'était un fichier Jupyter pour exécuter la démo.
2	bnn.py	BNN-Une bibliothèque pour exécuter PYNQ en Python.
3	X-X-thres.bin X-X-weights.bin classes.txt	Ceci est un fichier de paramètres. CPU/BNN le résultat de l'apprentissage avec GPU-Il est utilisé pour capturer avec PYNQ.	BinaryNets for Pynq - Training Networks
4	python_sw-cnv-pynq.so	Une bibliothèque partagée pour exécuter le Deep Learning sur le CPU.	make-sw.sh python_sw
	python_hw-cnv-pynq.so	Une bibliothèque partagée pour exécuter le Deep Learning sur les FPGA.	make-sw.sh python_hw
5	cnv-pynq-pynq.bit	Un fichier bitstream pour effectuer le traitement sur le FPGA. Lorsque vous changez de superposition, ce fichier change et est lu.	make-hw.sh

Cette fois, je vais personnaliser BNN-PYNQ, mais comme il y a un obstacle à reconstruire soudainement la structure du réseau, je voudrais changer les paramètres de lecture tout en gardant la même structure de réseau que Cifar10.

Sélection de concombre

Comme il est devenu un sujet brûlant pendant un certain temps, beaucoup d'entre vous le savent peut-être, mais classer les notes en 9 types en fonction de l'image du concombre pose un problème. Trier "Cucumber" par apprentissage en profondeur avec TensorFlow

Les données nécessaires à l'apprentissage sont publiées sur GitHub, nous allons donc les utiliser. GitHub publie deux types, ProtoType-1, 2, mais cette fois nous utiliserons ProtoType-1, qui a un format de jeu de données proche de Cifar10. GitHub - workpiles/CUCUMBER-9

2L〜2S
Bon produit. Bonne couleur, relativement droite et non biaisée en épaisseur. Il est trié en 5 étages de 2L à 2S selon la taille.

BL〜BS
Produit B. Celles de mauvaise couleur, légèrement pliées ou d'épaisseur inégale. Il est trié en 3 étapes de L à S selon la taille.

C
Produit C. Mauvaise forme.

En regardant certains blogs, il semble que le taux de réponse correcte soit d'environ 80% sans aucune ingéniosité. Cette fois, je suis très reconnaissant car je ne change pas la structure du réseau.

Contenu de la mise en œuvre

Apprentissage (implémenté sur une instance CPU / GPU)

Créez des données de paramètres à charger par FPGA. Comme mentionné dans le tableau ci-dessus, suivez la procédure publiée sur GitHub. BinaryNets for Pynq - Training Networks

Notez que ce fichier de paramètres doit être créé sur la CPU / GPU. Cette fois, j'ai configuré une instance GPU (NC6 Ubuntu 16.04) sur Azure.

Construire un environnement GPU

Installez les pilotes Nvidia, CUDA et cuDNN.

Installation des pilotes Nvidia

$ wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
$ sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
$ sudo apt-get update

Installation de CUDA

$ sudo apt-get install cuda -y

Installation de cuDNN

Pour installer, vous devez vous enregistrer en tant que développeur dans Nvidia et télécharger le fichier. NVIDIA cuDNN

$ sudo dpkg -i libcudnn5_5.1.10-1+cuda8.0_amd64.deb libcudnn5-dev_5.1.10-1+cuda8.0_amd64.deb

Paramètre PATH

$ sudo sh -c "echo 'CUDA_HOME=/usr/local/cuda' >> /etc/profile.d/cuda.sh"
$ sudo sh -c "echo 'export LD_LIBRARY_PATH=\${LD_LIBRARY_PATH}:\${CUDA_HOME}/lib64' >> /etc/profile.d/cuda.sh"
$ sudo sh -c "echo 'export LIBRARY_PATH=\${LIBRARY_PATH}:\${CUDA_HOME}/lib64' >> /etc/profile.d/cuda.sh"
$ sudo sh -c "echo 'export C_INCLUDE_PATH=\${C_INCLUDE_PATH}:\${CUDA_HOME}/include' >> /etc/profile.d/cuda.sh"
$ sudo sh -c "echo 'export CXX_INCLUDE_PATH=\${CXX_INCLUDE_PATH}:\${CUDA_HOME}/include' >> /etc/profile.d/cuda.sh"
$ sudo sh -c "echo 'export PATH=\${PATH}:\${CUDA_HOME}/bin' >> /etc/profile.d/cuda.sh"

Redémarrez l'instance

$ sudo reboot

Confirmation de l'installation

$ nvidia-smi
Thu Mar 30 07:42:52 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 8CFC:00:00.0     Off |                    0 |
| N/A   38C    P0    75W / 149W |      0MiB / 11439MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Installation de Python

Installez les bibliothèques Python (Theano, Lasagne, Numpy, Pylearn2). J'ai également installé pyenv en premier pour utiliser Python 2.7.

Installez pyenv et Python 2.7

$ sudo apt-get install git gcc make openssl libssl-dev libbz2-dev libreadline-dev libsqlite3-dev
$ git clone https://github.com/yyuu/pyenv.git ~/.pyenv

$ vi .bashrc
export PYENV_ROOT="$HOME/.pyenv"
export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init -)"
$ source .bashrc

$ env PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install 2.7.13
$ pyenv global 2.7.13

Installation de bibliothèques Python (Theano, Lasagne, Numpy, Pylearn 2)

$ pip install --user git+https://github.com/Theano/[email protected]
$ pip install --user https://github.com/Lasagne/Lasagne/archive/master.zip

$ echo "[global]" >> ~/.theanorc
$ echo "floatX = float32" >> ~/.theanorc
$ echo "device = gpu" >> ~/.theanorc
$ echo "openmp = True" >> ~/.theanorc
$ echo "openmp_elemwise_minsize = 200000" >> ~/.theanorc
$ echo "" >> ~/.theanorc
$ echo "[nvcc]" >> ~/.theanorc
$ echo "fastmath = True" >> ~/.theanorc
$ echo "" >> ~/.theanorc
$ echo "[blas]" >> ~/.theanorc
$ echo "ldflags = -lopenblas" >> ~/.theanorc

$ git clone https://github.com/lisa-lab/pylearn2
$ cd pylearn2/
$ python setup.py develop --user

Préparation du jeu de données

Préparez le jeu de données à entraîner. Cette fois, j'utiliserai les données d'image de concombre de GitHub.

$ git clone https://github.com/workpiles/CUCUMBER-9.git
$ cd CUCUMBER-9/prototype_1/
$ tar -zxvf cucumber-9-python.tar.gz

Préparation du programme

Nous allons apporter quelques modifications au programme Xilinx pour changer le jeu de données chargé par la formation. Les principaux changements sont les deux points suivants.

Changez les données à charger en CUCUMBER9
Changé en 9 classes de classification

Obtenez le programme sur BNN-PYNQ

$ git clone https://github.com/Xilinx/BNN-PYNQ.git
$ cd BNN-PYNQ/bnn/src/training/

Changer le programme à exécuter lors de l'apprentissage Créez cucumber9.py qui lit les données d'image du concombre et exécute la formation.

$ cp cifar10.py cucumber9.py
$ vi cucumber9.py

Modifications du convertisseur de données binaires BNN-PYNQ gère les données binarisées. Par conséquent, il est nécessaire de convertir les données de paramètres réels en données binaires. Créez cucumber9-gen-binary-weights.py qui apprend les données d'image du concombre et convertit les données de paramètre résultantes en binaire.

$ cp cifar10-gen-binary-weights.py cucumber9-gen-binary-weights.py
$ vi cucumber9-gen-binary-weights.py

Exécution de l'apprentissage

Maintenant que vous avez l'environnement, les données et le programme prêts à apprendre, exécutez le programme.

$ pwd /home/ubuntu/BNN-PYNQ/bnn/src/training
$ python cucumber9.py
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release.  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5110)
/home/ubuntu/.local/lib/python2.7/site-packages/theano/tensor/basic.py:2144: UserWarning: theano.tensor.round() changed its default from `half_away_from_zero` to `half_to_even` to have the same default as NumPy. Use the Theano flag `warn.round=False` to disable this warning.
  "theano.tensor.round() changed its default from"
batch_size = 50
alpha = 0.1
epsilon = 0.0001
W_LR_scale = Glorot
num_epochs = 500
LR_start = 0.001
LR_fin = 3e-07
LR_decay = 0.983907435305
save_path = cucumber9_parameters.npz
train_set_size = 2475
shuffle_parts = 1
Loading CUCUMBER9 dataset...
Building the CNN...
W_LR_scale = 20.0499
H = 1
W_LR_scale = 27.7128
H = 1
W_LR_scale = 33.9411
H = 1
W_LR_scale = 39.1918
H = 1
W_LR_scale = 48.0
H = 1
W_LR_scale = 55.4256
H = 1
W_LR_scale = 22.6274
H = 1
W_LR_scale = 26.1279
H = 1
W_LR_scale = 18.6369
H = 1
Training...
Epoch 1 of 500 took 6.08435511589s
  LR:                            0.001
  training loss:                 1.48512187053
  validation loss:               2.05507221487
  validation error rate:         61.1111117734%
  best epoch:                    1
  best validation error rate:    61.1111117734%
  test loss:                     2.05507221487
  test error rate:               61.1111117734%

…

Epoch 500 of 500 took 5.53324913979s
  LR:                            3.04906731299e-07
  training loss:                 0.0024273797482
  validation loss:               0.132337698506
  validation error rate:         14.2222222355%
  best epoch:                    205
  best validation error rate:    11.9999999387%
  test loss:                     0.124302371922
  test error rate:               11.9999999387%

Après un certain temps, l'apprentissage sera terminé et le fichier de paramètres sera complété.

$ ls
cucumber9_parameters.npz

Binarisation des données de paramètres

Convertit les données de paramètres réels en données binaires.

$ python cucumber9-gen-binary-weights.py
cucumber9_parameters.npz

Les données des paramètres binaires sont complétées. Laissez PYNQ lire ce fichier.

$ ls binparam-cnv-pynq/
0-0-thres.bin     0-3-weights.bin   1-12-thres.bin    1-20-weights.bin  1-2-thres.bin     1-9-weights.bin   2-3-thres.bin     3-11-weights.bin  3-6-thres.bin    6-0-weights.bin
0-0-weights.bin   0-4-thres.bin     1-12-weights.bin  1-21-thres.bin    1-2-weights.bin   2-0-thres.bin     2-3-weights.bin   3-12-thres.bin    3-6-weights.bin  7-0-thres.bin
0-10-thres.bin    0-4-weights.bin   1-13-thres.bin    1-21-weights.bin  1-30-thres.bin    2-0-weights.bin   2-4-thres.bin     3-12-weights.bin  3-7-thres.bin    7-0-weights.bin
0-10-weights.bin  0-5-thres.bin     1-13-weights.bin  1-22-thres.bin    1-30-weights.bin  2-10-thres.bin    2-4-weights.bin   3-13-thres.bin    3-7-weights.bin  8-0-thres.bin
0-11-thres.bin    0-5-weights.bin   1-14-thres.bin    1-22-weights.bin  1-31-thres.bin    2-10-weights.bin  2-5-thres.bin     3-13-weights.bin  3-8-thres.bin    8-0-weights.bin
0-11-weights.bin  0-6-thres.bin     1-14-weights.bin  1-23-thres.bin    1-31-weights.bin  2-11-thres.bin    2-5-weights.bin   3-14-thres.bin    3-8-weights.bin  8-1-thres.bin
0-12-thres.bin    0-6-weights.bin   1-15-thres.bin    1-23-weights.bin  1-3-thres.bin     2-11-weights.bin  2-6-thres.bin     3-14-weights.bin  3-9-thres.bin    8-1-weights.bin
0-12-weights.bin  0-7-thres.bin     1-15-weights.bin  1-24-thres.bin    1-3-weights.bin   2-12-thres.bin    2-6-weights.bin   3-15-thres.bin    3-9-weights.bin  8-2-thres.bin
0-13-thres.bin    0-7-weights.bin   1-16-thres.bin    1-24-weights.bin  1-4-thres.bin     2-12-weights.bin  2-7-thres.bin     3-15-weights.bin  4-0-thres.bin    8-2-weights.bin
0-13-weights.bin  0-8-thres.bin     1-16-weights.bin  1-25-thres.bin    1-4-weights.bin   2-13-thres.bin    2-7-weights.bin   3-1-thres.bin     4-0-weights.bin  8-3-thres.bin
0-14-thres.bin    0-8-weights.bin   1-17-thres.bin    1-25-weights.bin  1-5-thres.bin     2-13-weights.bin  2-8-thres.bin     3-1-weights.bin   4-1-thres.bin    8-3-weights.bin
0-14-weights.bin  0-9-thres.bin     1-17-weights.bin  1-26-thres.bin    1-5-weights.bin   2-14-thres.bin    2-8-weights.bin   3-2-thres.bin     4-1-weights.bin  classes.txt
0-15-thres.bin    0-9-weights.bin   1-18-thres.bin    1-26-weights.bin  1-6-thres.bin     2-14-weights.bin  2-9-thres.bin     3-2-weights.bin   4-2-thres.bin
0-15-weights.bin  1-0-thres.bin     1-18-weights.bin  1-27-thres.bin    1-6-weights.bin   2-15-thres.bin    2-9-weights.bin   3-3-thres.bin     4-2-weights.bin
0-1-thres.bin     1-0-weights.bin   1-19-thres.bin    1-27-weights.bin  1-7-thres.bin     2-15-weights.bin  3-0-thres.bin     3-3-weights.bin   4-3-thres.bin
0-1-weights.bin   1-10-thres.bin    1-19-weights.bin  1-28-thres.bin    1-7-weights.bin   2-1-thres.bin     3-0-weights.bin   3-4-thres.bin     4-3-weights.bin
0-2-thres.bin     1-10-weights.bin  1-1-thres.bin     1-28-weights.bin  1-8-thres.bin     2-1-weights.bin   3-10-thres.bin    3-4-weights.bin   5-0-thres.bin
0-2-weights.bin   1-11-thres.bin    1-1-weights.bin   1-29-thres.bin    1-8-weights.bin   2-2-thres.bin     3-10-weights.bin  3-5-thres.bin     5-0-weights.bin
0-3-thres.bin     1-11-weights.bin  1-20-thres.bin    1-29-weights.bin  1-9-thres.bin     2-2-weights.bin   3-11-thres.bin    3-5-weights.bin   6-0-thres.bin

Inférence (implémentée par PYNQ)

Disposition des données de paramètres

Transférez les données de paramètres créées précédemment vers PYNQ.

$ sudo mkdir /opt/python3.6/lib/python3.6/site-packages/bnn/params/cucumber9
$ sudo ls /opt/python3.6/lib/python3.6/site-packages/bnn/params/cucumber9/
0-0-thres.bin     0-3-weights.bin   1-12-thres.bin    1-20-weights.bin  1-2-thres.bin     1-9-weights.bin   2-3-thres.bin     3-11-weights.bin  3-6-thres.bin    6-0-weights.bin
0-0-weights.bin   0-4-thres.bin     1-12-weights.bin  1-21-thres.bin    1-2-weights.bin   2-0-thres.bin     2-3-weights.bin   3-12-thres.bin    3-6-weights.bin  7-0-thres.bin
0-10-thres.bin    0-4-weights.bin   1-13-thres.bin    1-21-weights.bin  1-30-thres.bin    2-0-weights.bin   2-4-thres.bin     3-12-weights.bin  3-7-thres.bin    7-0-weights.bin
0-10-weights.bin  0-5-thres.bin     1-13-weights.bin  1-22-thres.bin    1-30-weights.bin  2-10-thres.bin    2-4-weights.bin   3-13-thres.bin    3-7-weights.bin  8-0-thres.bin
0-11-thres.bin    0-5-weights.bin   1-14-thres.bin    1-22-weights.bin  1-31-thres.bin    2-10-weights.bin  2-5-thres.bin     3-13-weights.bin  3-8-thres.bin    8-0-weights.bin
0-11-weights.bin  0-6-thres.bin     1-14-weights.bin  1-23-thres.bin    1-31-weights.bin  2-11-thres.bin    2-5-weights.bin   3-14-thres.bin    3-8-weights.bin  8-1-thres.bin
0-12-thres.bin    0-6-weights.bin   1-15-thres.bin    1-23-weights.bin  1-3-thres.bin     2-11-weights.bin  2-6-thres.bin     3-14-weights.bin  3-9-thres.bin    8-1-weights.bin
0-12-weights.bin  0-7-thres.bin     1-15-weights.bin  1-24-thres.bin    1-3-weights.bin   2-12-thres.bin    2-6-weights.bin   3-15-thres.bin    3-9-weights.bin  8-2-thres.bin
0-13-thres.bin    0-7-weights.bin   1-16-thres.bin    1-24-weights.bin  1-4-thres.bin     2-12-weights.bin  2-7-thres.bin     3-15-weights.bin  4-0-thres.bin    8-2-weights.bin
0-13-weights.bin  0-8-thres.bin     1-16-weights.bin  1-25-thres.bin    1-4-weights.bin   2-13-thres.bin    2-7-weights.bin   3-1-thres.bin     4-0-weights.bin  8-3-thres.bin
0-14-thres.bin    0-8-weights.bin   1-17-thres.bin    1-25-weights.bin  1-5-thres.bin     2-13-weights.bin  2-8-thres.bin     3-1-weights.bin   4-1-thres.bin    8-3-weights.bin
0-14-weights.bin  0-9-thres.bin     1-17-weights.bin  1-26-thres.bin    1-5-weights.bin   2-14-thres.bin    2-8-weights.bin   3-2-thres.bin     4-1-weights.bin  classes.txt
0-15-thres.bin    0-9-weights.bin   1-18-thres.bin    1-26-weights.bin  1-6-thres.bin     2-14-weights.bin  2-9-thres.bin     3-2-weights.bin   4-2-thres.bin
0-15-weights.bin  1-0-thres.bin     1-18-weights.bin  1-27-thres.bin    1-6-weights.bin   2-15-thres.bin    2-9-weights.bin   3-3-thres.bin     4-2-weights.bin
0-1-thres.bin     1-0-weights.bin   1-19-thres.bin    1-27-weights.bin  1-7-thres.bin     2-15-weights.bin  3-0-thres.bin     3-3-weights.bin   4-3-thres.bin
0-1-weights.bin   1-10-thres.bin    1-19-weights.bin  1-28-thres.bin    1-7-weights.bin   2-1-thres.bin     3-0-weights.bin   3-4-thres.bin     4-3-weights.bin
0-2-thres.bin     1-10-weights.bin  1-1-thres.bin     1-28-weights.bin  1-8-thres.bin     2-1-weights.bin   3-10-thres.bin    3-4-weights.bin   5-0-thres.bin
0-2-weights.bin   1-11-thres.bin    1-1-weights.bin   1-29-thres.bin    1-8-weights.bin   2-2-thres.bin     3-10-weights.bin  3-5-thres.bin     5-0-weights.bin
0-3-thres.bin     1-11-weights.bin  1-20-thres.bin    1-29-weights.bin  1-9-thres.bin     2-2-weights.bin   3-11-thres.bin    3-5-weights.bin   6-0-thres.bin

Placement des données de test

Téléchargez les données de test utilisées pour l'inférence dans PYNQ.

$ git clone https://github.com/workpiles/CUCUMBER-9.git
$ cd CUCUMBER-9/prototype_1/
$ tar -zxvf cucumber-9-python.tar.gz

Exécuter l'inférence

Exécutons-le depuis Jupyter comme dans la démo précédente. Lors de l'exécution de CUCUMBER9, spécifiez de lire «cucumber9» comme paramètre comme indiqué ci-dessous.

classifier = bnn.CnvClassifier('cucumber9')

Le résultat de l'exécution est comme indiqué dans la capture ci-dessous.

screencapture-192-168-0-15-9090-nbconvert-html-bnn-Cucumber9-ipynb-1492315054198-min.png

Vous pouvez le classer correctement! Le temps d'exécution est le suivant. Bien que le processeur du PYNQ soit médiocre, le résultat du FPGA est environ 360 fois plus rapide.

`FPGA`


Inference took 2240.00 microseconds
Classification rate: 446.43 images per second

`CPU`


Inference took 816809.00 microseconds
Classification rate: 1.22 images per second

référence

Lors de la rédaction du programme, je me suis référé au blog suivant.

en conclusion

Cette fois, PYNQ était alimenté par une batterie mobile. J'ai été surpris de la quantité d'énergie économisée.

Essayez le Deep Learning avec les concombres FPGA-Select