| In recent years,the breakthrough progress of deep learning in the field of vision has become a hot topic of research,especially the algorithm models represented by convolutional neural networks have been constantly innovating.However,the amount of calculation of the algorithm model has increased,making how to achieve algorithm acceleration has become an urgent problem to be solved.The current mainstream deployment platforms for deep learning algorithms have certain shortcomings,such as slow CPU computing speed,high GPU power consumption,and application-specific integrated circuit(ASIC)is not easy to change the network model,which makes it difficult for hardware platforms to balance real-time performance and economy.On the contrary FPGA has good parallel computing capabilities,low power consumption and rich resources,and is more suitable for convolutional neural network(CNN)acceleration.At present,in the later stage of fruit processing,the traditional mechanical classifier is used to classify the picked fruits,and the method of classifying according to their external quality has the problems of low accuracy and slow speed.The paper proposes an algorithm for grading the external quality of fruits based on deep learning.At the same time,it is deployed in a Field Programmable Gate Array(FPGA)to achieve algorithm acceleration.It can achieve higher results without human involvement.Recognition rate and work efficiency.The main work of the thesis is as follows:(1)In order to improve the effect of fruit grading,the paper uses the lightweight convolutional neural network Mobile Net V2 algorithm to improve the recognition speed and accuracy.By modifying the fully connected network layer and increasing the attention mechanism,the Mobile Net V2 algorithm is adaptively improved.Using a self-made data set,the improved algorithm was verified.Experiments show that the accuracy of the improved Mobile Net V2 algorithm has increased by 1.95%.(2)In order to improve the operation speed of the convolutional neural network,the paper uses Xilinx’s ZCU104 as the development platform to perform hardware acceleration processing on the pre-processing and improved Mobile Net V2 algorithm.According to its ARM+FPGA architecture,the FPGA algorithm implementation is divided into two parts: Programmable Logic(PL)and Process System(PS).First,mean filtering and bilinear difference scaling are implemented on the PL side,and then the convolution function of Mobile Net V2 is optimized in parallel.Through dual-channel convolution and parallel calculation of the data obtained from the channel direction of the input feature map,the module is improved The efficiency of execution,and the parallel batch normalization(BN)module is designed internally,and the high parallel stream pooling module is designed at the same time.In addition,the activation function Softmax function has been improved for multiple times of table lookup.On the PS side,the network reconstruction of Mobile Net V2 and the data flow scheduling of the reverse residual layer are realized.Experiments have proved that under the condition that the accuracy of the CPU,GPU and FPGA platforms are not much different,the running speed of the FPGA parallelization algorithm is 6.8 times that of the CPU and FPGA runs slightly faster than GPU. |