| The Convolutional Neural Network(CNN),characterized by large-scale parameters and intensive computation,has become one of the most representative studies in the field of machine learning.In order to meet the needs of actual production and life,real-time has become one of the hot spots in its research.For the hundreds of billions of computations of inferred networks,many existing CNN optimization schemes need to be retrained on the network.Most of the studies that do not require network retraining are based on difference-value,but the existing studies have adopted a space-for-time approach,adding additional cache space,which brings confusion and limitations to devices with limited resources.In order to solve the above problems,this thesis selects the flame recognition detection scene and the surveillance video detection scene that have practical significance in production and life,and uses the differential-value to optimize and accelerate the YOLO-v5 network in the two scenarios without adding a lot of space.The main work of this thesis is as follows:(1)For the single-object detection of flame recognition,this work analyzes the color characteristics of the flame and gives the constraints to establish a model for the suspected feature area of the flame,and controls the convolution and other operations within the area of the coordinate lock by using the coordinates of the suspected feature area to identify.After experiments,it was concluded that after adding a model for suspected feature areas to the trained YOLO-v5 network,the average inference time per picture was reduced by 0.087 s,and the total inference time was shortened by 17.5%.The total FLOPs of the network added to the suspected feature area model reduced by10.4% and by 6.7% of the total FLOPs before optimization under day and night environmental.And it has little impact on the false positive rate,precision and accuracy.(2)For the surveillance video detection scene,this work uses the temporal sparsity on adjacent video frames to establish a dynamic background update model,and combines the model into the trained neural network.In the model,the background is searched for synonymous frames to speed up neural network inference.Finally,the frame number,time and accuracy of video inference are used as indicators to evaluate it.The video data set used in this experiment has about 269,200 frames of images.When using the YOLO-v5 network without dynamic background update model for inference,the time of inference is 27.86 hours,and after the network adopts the dynamic background update model.At the expense of 1.9% accuracy,the time of inference is reduced by 9.47 hours which required to detect the whole video is shortened by 34%. |