Font Size: a A A

Study On FPGA-based Visual Semantic Segmentation Network Acceleration

Posted on:2021-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:H Z HuangFull Text:PDF
GTID:2428330614472158Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of deep learning technologies,visual semantic segmentation,which represented by the semantic segmentation network,has been widely used in many fields such as intelligent robots,security,and autonomous driving.However,the semantic segmentation network has high requirements on computing resources and programmability for hardware platforms.The FPGA-based hardware system not only has flexible programmability and high embeddability but also can meet lower power consumption requirements,which make it a good solution for semantic segmentation on terminal devices.In terms of hardware acceleration,the existing neural network FPGA accelerators pay more attention to the implementation and optimization of the classic classification neural network based on CNN,but work on the semantic segmentation network for Encoder-Decoder structure is relatively less.Considering the difference in the architecture of the two networks,it is very important to research the FPGA acceleration technologies for the semantic segmentation network for the edge devices deployment of it.By taking the semantic segmentation network Seg Net as an example,technologies of FPGA-based semantic segmentation network acceleration are studied and an FPGA hardware accelerator that can implement the network inference process is designed.The main contribution is:(1)The algorithm flow optimization technologies for Seg Net including merging convolution and BN operations and using relative pooling indices are studied and designed to reduce the complexity of the algorithm flow and the storage consumption of pooling indices by 75%.The fixed-point quantization strategy of static & dynamic fixed-point numbers is designed.Compared with the Caffe network inference process using 32-bits floating-point numbers,the loss of global accuracy,class accuracy,and m Io U under the 8-bits fixed-point static & dynamic quantity strategy is 3.82%,6.30%,and 4.78%.The corresponding parallel data structure is designed to make the system throughput increase exponentially according to the design parallelism.(2)Using the Open CL flow,the convolution kernel is designed based on parallel computing.And the pooling,unpooling kernels are designed based on efficient pipelines.Different modes of layer connections are implemented to complete on-chip deployment and acceleration of the network through the configurable data pipes.Based on the parallel data structure,the off-chip-on-chip data access and storage scheme based on grouping are designed to balance the system throughput and hardware resource utilization by exploring designed space.(3)The hardware deployment and evaluation are implemented on the Intel Arria-10 GX1150 FPGA development platform.When using the RGB image with an input resolution of 224 × 224,the designed accelerator achieves a throughput higher than 432.8 GOP/s and an energy efficiency ratio of 21.64 GOP/s/W.
Keywords/Search Tags:Semantic segmentation network, FPGA, OpenCL, SegNet, Acceleration platform
PDF Full Text Request
Related items