Deep neural networks have achieved great success in processing computer vision tasks and have become a priority solution for image inspection applications.With the continuous improvement of the size and depth of the convolutional neural network,the parameter amount and calculation time of the network model are also increasing,and the traditional general-purpose processor platform is gradually incapable of real-time detection tasks.The urgent need to accelerate neural networks has caused high-performance processors to receive widespread attention at home and abroad,and designing new hardware structures for complex problems has become a new research center.The YOLOv4 network is a target detection network designed by combining a large number of advanced technologies.It achieves excellent performance in both speed and accuracy.Its network model is mainly composed of convolutional layers.It is a typical deep neural network with better performance.High research value.Based on the current research status of convolutional neural network acceleration methods at home and abroad,this paper analyzes the advantages and disadvantages of software acceleration and hardware acceleration,and uses the multi-core vector accelerator as the mapping platform of the YOLO network.The calculation of the convolutional layer includes data multiplication and addition.The vector accelerator supports efficient vector calculation and is suitable for convolutional mapping.The main tasks are as follows:· This paper analyzes the YOLOv4 tiny network algorithm,combined with the M-DSP architecture,and proposes a mapping scheme for the algorithm.Aiming at the characteristics of the multi-core vector accelerator architecture,a data storage scheme for convolutional neural network calculations is designed,and the parallel calculation of the YOLOv4 tiny network on multiple DSP cores is realized.· This paper designs and implements the mapping method of the convolutional layer,pooling layer and sampling layer of the algorithm,and realizes the mapping and multi-core parallel computing of the YOLOv4 tiny network algorithm on M-DSP.Starting from the overall network model,a fusion strategy of multiple network layers is proposed,which reduces the input and output of data and reduces the total execution time.· Based on the M-DSP test environment,this paper has carried out the verification of the design scheme.The experimental results show that the scheme can effectively map the convolutional neural network to the vector accelerator platform and achieve a certain parallel acceleration effect.In the eight-core vector accelerator with a working frequency of 1.8GHz,the mapping scheme has reached a computational efficiency of 29.83%.Compared with the convolution acceleration library Tensor RT of the graphics processor platform,it has achieved a performance improvement of 31.75%. |