Road target detection is an important research content in the computer vision neighborhood.The detection of people and vehicles on the road should enable the model to accurately locate the target on the road and accurately identify the type of the target.The target tracking field should be used as a preprocessing step.In the unmanned vehicle,this task controls the judgment of turning,decelerating,and parking the vehicle.In the field of target tracking,people or vehicles must be identified and positioned before the target can be tracked.The target has an important impact on the reconnaissance task.This paper first introduces the development process and related technologies of target detection,and then improves the model of the YOLO(You Look Only Once)series to increase the detection accuracy of small targets and improve the accuracy of the model.The main work done in this paper is as follows:(1)In order to improve the accuracy of YOLOv3 model while keeping the efficiency basically unchanged,firstly,increase the connection between different feature channels through pyramid segmentation module to enhance the feature extraction capability of the backbone network;Secondly,before feature fusion,spatial pyramid module is added to expand receptive field by using large-scale convolution to avoid losing a lot of information;Finally,use the balance loss function to adjust the confidence and classification loss,and use Complete Io U Loss(CIo U)as the regression loss of the bounding box.(2)To enhance the accuracy of YOLOv5 model while keeping the efficiency basically unchanged,first use receptive field blocks to enhance the receptive field size,then use adaptive spatial feature fusion(AFSS)strategy to re integrate features,and then use a simplified decoupling head to separate classification and positioning tasks,so as to reduce the contradiction between the two tasks due to different concerns.Finally,use SIo U loss function,introduce direction factors,and improve the regression efficiency of the boundary box.YOLOv3 and YOLOv5 both select the appropriate training set and test set for detection in the same VOC2007 and VOC2012 datasets,and the number of images involved in the detection reaches 11,503.Compared with Original models,they have good advantages in accuracy,and the m AP values reach 78.40% and 80.32%,respectively,which are improved by.1.24% and 1.05%Figure [18] Table [6] Reference [80]... |