| In recent years,convolutional neural network(CNN)has been widely used in medical image detection,remote sensing detection,crop pest diagnosis and other fields.In order to have better performance in more complex tasks,the traditional convolutional neural network architecture is designed more and more deeply.With the deepening of the network architecture,the amount of network parameters and calculations has increased significantly,which has brought great difficulties to the deployment of CNN in embedded devices,mobile terminals and other end-to-end devices.Therefore,how to achieve higher acceleration performance on the end-to-side platform with limited hardware resources is a huge challenge.To solve the above problems and challenges,a new deep separable convolutional neural network is designed,and the hardware accelerator and FPGA verification are designed on the Ultra96-V2 FPGA platform.In the aspect of algorithm design,the advantages and disadvantages of deep convolution DWC and point convolution PWC are analyzed,and a new network architecture Res Mobile Net is designed based on the idea of Mobile Net V2 and residual connection.On CIFAR data set,the amount of parameters and calculation are 63% and 52% of that of Mobile Net V2 respectively,and the classification accuracy is 0.58% higher than that of Mobile Net V2.On the Image Net dataset,the accuracy is similar to that of Mobile Net V2,but the parameter quantity is only 60% of that of Mobile Net V2.In terms of hardware acceleration,a general shared convolution PE unit is designed,which can realize multiple convolution modes under the condition of hardware resource sharing,namely normal convolution conv3×3,deep convolution DWC,point convolution PWC and deep convolution expansion channel DWC2.After the General PE unit is shared,the DSP resources are reduced by 2.1 times.Moreover,the internal data flow of Res Mobile Net block is optimized,and two complementary design schemes of pre fusion and post fusion in the block are proposed.According to the depth of the network,different data flow modes can be flexibly selected to realize the data flow between layers.The hardware acceleration design of Res Mobile Net is completed based on Ultra96-V2 FPGA.Taking full account of hardware resources,the utilization rate of accelerator DSP is 71%,the peak performance at 100 MHz clock frequency is 51.2GOPS,the unit frame rate is 11.93 FPS,and the power consumption is 3.89 W,which meets the design requirements.The research of this thesis has a certain reference value for the efficient implementation of deep separable convolution on hardware platform in the future. |