I am also interested in speeding up with FPGA like Zynq. As a software shop, I want to take an approach that can speed up quickly without thinking too much. Even when using FPGA, I would like to try how to use it so that an implementation equivalent to OpenCV is prepared in advance. However, I still have no chance to try it.
"Accelerate OpenCV Applications on Zynq-7000 All Programmable SoCs with Vivado HLS Video Library"
http://japan.xilinx.com/support/documentation/application_notes/j_xapp1167.pdf
Evaluating Vivado High-Level Synthesis on OpenCV Functions http://www.idt.mdh.se/utbildning/exjobb/files/TR1803.pdf
FPGA room I tried using OpenCV with Vivado HLS 2015.4 1
The method of implementing OpenCV in Vivado HLS that I will be using now is that the input / output part of the function to be hardware is done by AXI4 Stream. On the test bench, IplImage2AXIvideo () is used to convert IplImage to AXI4 Stream, which is then put into image_filter (), and after processing, AXIvideo2IplImage () is used to convert AXI4 Stream to IplImage.
FPGA Room I tried using OpenCV with Vivado HLS 2015.4 7 (FAST Corners Detection 1)
I am using the code provided by Xilinx for high level synthesis as follows. hls::AXIvideo2Mat(INPUT_STREAM, img_0); hls::Duplicate(img_0, img_1, img_1_); hls::CvtColor<HLS_BGR2GRAY>(img_1, img_1g); hls::FASTX(img_1g, mask, 20, true); hls::Dilate(mask, dmask); hls::PaintMask(img_1_, dmask, img_3, color); hls::Mat2AXIvideo(img_3, OUTPUT_STREAM);
As long as you use it this way, you can use the FPGA without knowing about Verilog HDL. When it comes to the purpose of getting results quickly, it's best for software stores to avoid writing Verilog HDL.
We are investigating PYNQ.
--FPMA Room PYNQ Board 1 --FPMA Room PYNQ Board 2 (Starting Linux)
At this point, anyone familiar with the Raspberry Pi should be able to launch it in the same way.
--FPMA Room PYNQ Board 3 (Jupyter Notebook) It seems to be an example of executing the example prepared for the PYNQ board while using the upyter Notebook environment.
The author of the FPGA Room site continues to write many other useful articles.
Production of Convolutional Neural Network Circuit for Handwritten Number Recognition 1 (Overview)
Example of using BNN-PYNQ on the PYNQ board: I have a Cifar10 example that I have already learned and am running it from the Jupyter notebook environment.
-Run Zynq in Python! PYNQ = Python + Zynq --Xilinx PYNQ development background and future direction
-Implementation of deep learning by FPGA --qiita Try Deep Learning with FPGA --Select cucumbers --This is an implementation of the PYNQ board using BNN-PYNQ. -I am learning from Sorting "cucumbers" by deep learning with TensorFlow.
connpass FPGA Extreme Computing
connpass "PYNQ Festival" Overtime: FPGA Deep Learning Practice Social gathering
Interface June 2017 Table of Contents World of Hard Computing ... Ultra-fast Python with GPU & FPGA
Recently, the university is busy and I haven't had time to write HDL, so I've been doing high-level synthesis since around 2015. Xilinx's high-level synthesis tool "Vivado HLS" (High-Level Synthesis) can be used for synthesis in one shot. I thought, "It's that easy." At university, I don't use Verilog HDL (the hardware description language normally used in FPGAs) and only teach high-level synthesis. All students are already using FPGAs as if they were writing software code.
──What can I use for the deep learning library?
Nakahara: You can use TensorFlow, Caffe, Chainer, etc. In addition, since various frameworks are used depending on the company or department, it supports multiple frameworks. If you write the code in Python, make some ingenuity, put it in C ++, and put it in the Xilinx tool, it will work.
──Is it OK if you write Python to work on deep learning with FPGA?
Nakahara is OK. You don't even have to write C language.
--SlideShare I should have been super-introduced to machine learning ... I was touching FPGA before I knew it
--qiita It's time for programmers to touch FPGA!
Jupyter notebook is also used in the operation example of handwritten number character recognition (MNIST) by deep learning (CNN) in this article. Therefore, it is easy to confirm the content as if one of the input and output is an image.
--SlideShare Professor Nakahara's TensorFlow high-level synthesis demo I tried running binarized CNN DQN with FPGA --The implementation of the detector developed by Professor Nakahara of Tokyo Institute of Technology is published on github under the name GUINNESS.
github https://github.com/HirokiNakahara/GUINNESS
Note: You can find videos by searching for PYNQ on YouTube.
Deep Learning does not use sigmoid functions like the old Neural Network. Therefore, it is difficult for the sensitivity to become poor due to saturation. Utilizing this fact, it is possible to express the weight indicating the connection with a data type having a small bit width. Using the INT8 type makes the data much smaller than using floating point numbers, and the operation is simpler, which makes it easier to circuit.
-Implementation of INT8-optimized deep learning on Xilinx devices
A linear algebra BLAS library is provided. It makes it easier to use algorithms written in standard libraries. Even in this case, it is optimized for 16bit and 8bit integers.
GoogLeNet SSD FCN-AlexNet AlexNet VGG
There seems to be information available to generate the circuit for.
--Altera's article Why FPGAs are best for CNN implementation --Intel News Release [Intel Realizes "Real-Time AI" for Microsoft's New High-Speed Deep Learning Platform](https://newsroom.intel.co.jp/news/intel-delivers-real-time-ai-microsofts -accelerated-deep-learning-platform /) --Intel article 2017: Deep learning with midrange FPGA, achieving efficiency higher than NVIDIA "Tesla M4"
The following people have also written many articles related to FPGA. Recently, an article in a magazine published by CQ has also been written. Hidemi's Idea Note Github Hidemi Ishihara
Recommended Posts