Research On Parallel Computing Architecture Of Siamese Network Algorithm

Posted on:2021-06-09

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Lu

Full Text:PDF

GTID:2518306104999749

Subject:Control Engineering

Abstract/Summary:

PDF Full Text Request

Target tracking technology in the complex context of embedded platforms plays an important role in intelligent video surveillance devices,drone guidance,unmanned aerial vehicles,missile guidance,and other areas.In recent years,due to the development of deep learning,the application of convolutional neural networks to tracking problems has the advantages of high accuracy and robustness,but the high computational complexity of tracking algorithms based on convolutional features and the limitations of the area and power consumption of embedded platforms make it difficult to meet the requirements in real time.To solve this problem,the paper aims to design a hardware deployment scheme for a convolutional network suitable for an embedded platform that meets the power constraints,using Xilinx’s ZYNQ series FPGAs as the algorithm and functional verification platform to provide hardware acceleration for the convolutional feature-based tracking algorithm Siamese-FC convolutional network.In order to solve the difficulty of deploying convolutional feature-based tracking algorithms on FPGA platforms,the paper analyzes the computational process of the Siamese-FC algorithm for lightweight fully convolutional neural networks,and study the computational and storage resource requirements of the Siamese-FC algorithm in combination with the computational characteristics of each different layer in the convolutional network.Taking into account the resource limitations of FPGA platforms,the paper proposed a quantization scheme for convolutional networks.The quantization scheme is based on NVIDIA’s Tensor RT solution,which change the forward computing structure of the network while quantifies the network’s computation to an 8-bit integer,is 1.5 times faster than before quantization while maintaining the network accuracy.In order to solve the problem that depth feature-based tracking algorithms are difficult to achieve real-time processing in FPGA platforms,the paper also analyzes the parallelism of the Siamese-FC algorithm as an example,and divides the deployment of Siamese-FC convolutional network into a processing system part and programmable logic part of the ZYNQ chip according to the focus of the hardware implementation of data scheduling and forward reasoning network structure of the convolutional network deployment,taking full advantage of the high throughoutput and high parallelism of the FPGA platform.The programmable logic part is accelerated separately for each different layer of the convolutional layer.The hardware architecture in the paper is 6.19 times faster compared to implementing the same algorithm on a CPU.The paper concludes with an analysis of the resource consumption of hardware deployments,with on-chip storage resources accounting for 55% and logical resource LUTs accounting for was 49.7%,the multiplier resource DSP48 was 34%,and the logic resource FF was 16.7%.The paper also compared and analyzed the power consumption of each platform,the power consumption of the entire development board at runtime is only 13.7% of the CPU.The solution in the paper have practical value in the Embedded application.At the end of the paper,the shortcomings of the paper and the directions for improvement are identified.

Keywords/Search Tags:

Siamese Neural Networks, Convolutional Neural Networks, Fixed Point Quantization, FPGA hardware acceleration, High Level Synthesis

PDF Full Text Request

Related items

1	Research On Hardware Parallel Acceleration For Novel Convolutional Neural Networks
2	Research On Forward Propagation Acceleration Technology Of Recurrent Neural Network Based On FPGA
3	Model Compression And Hardware Acceleration Of Convolutional Neural Networks
4	Research On Algorithms Of Implementing Convolutional Neural Networks By Hardware
5	Research On Dynamic Quantization Algorithm Of Convolutional Neural Networks And Its Parallel Computing Structure
6	Research On Neural Network Accelerator Based On PYNQ
7	Research On Key Techniques Of Deep Convolutional Neural Network Accelerators Based On FPGA Bus Framework
8	Research On Key Technologies Of Hardware Implementation Of Convolutional Neural Networks
9	Research On Acceleration Of Low-Precision Convolutional Neural Networks On FPGA
10	Research On Lightweight Convolutional Neural Network Algorithm And Hardware Collaborative Acceleration Technology