| With the explosion of the number of vehicles in China and the high frequency of traffic accidents,autonomous driving technology has become a key solution to effectively solve the problem.Vehicle target detection is an important research direction in the field of autonomous driving,and its detection accuracy and detection rate are important indicators to evaluate the performance of the model.There are two major types of current target detection algorithms,which are region based detection algorithms and regression based detection algorithms,region based detection algorithms such as RCNN,Fast RCNN,Faster RCNN,etc.,and regression based detection algorithms such as SSD,YOLO,etc.Among them,YOLOv3 not only has fast detection capability compared with each algorithm,but also the feature extraction network references the residual structure to deepen the network layers,which greatly strengthens the network expression capability and improves the feature extraction effect.Currently in the detection of vehicle targets,the effect of detection for small targets is not very satisfactory.The purpose of this paper is to enhance the detection of small vehicle targets while improving the detection performance of the model.The main work is as follows:(1)The YOLOv3 network has a redundant number of parameters,which not only wastes computational resources,but also weakens the model detection speed.To improve this deficiency,the Mobile Netv2 network was fused into the YOLOv3 model,replacing the structure with Darknet-53 as the main feature extraction network,as a way to reduce the redundant computation of the original network.Experiments show that the amount of network parameters is reduced by about 74% by using the Mobile Netv2 network for light weighting.At the same time,the K-means++ clustering algorithm is used to improve the K-means algorithm for the phenomenon of incomplete clustering,and the improved edges are more suitable for the training of the network.(2)YOLOv3 usually uses a convolution with a step size of 2 for down-sampling,which greatly reduces the extraction of shallow features by the model.In this paper,to address this problem,we propose to use null convolution instead of down-sampling operation,which can expand the perceptual field,retain more global information and detail information,and enhance the extraction of small target contour features by the network.On this basis,in order to solve the problem of interference gradient back-propagation caused by different scales of YOLOv3 detection layers,we propose to use ASFF structure for feature fusion of the last three detection layers to further enhance the model’s ability to acquire features.Experiments show that the feature enhancement network based on ASFF and null convolution can obtain better feature details and more accurate classification accuracy of targets(3)To address the weak generalization ability of YOLOv3 border loss function,this paper proposes to use CIo U loss function instead of MSE loss function.CIo U takes into account three important factors,such as overlap area,centroid distance and aspect ratio,and has scale invariance and better generalization ability.In this paper,the lightweight network is fused with the feature enhancement network,and the fused model architecture is applied to the YOLOv3 network,so that the model can obtain faster detection speed and higher detection accuracy.Experiments show that the YOLOv3 model based on CIo U loss and fusion network converges faster in loss values during training,classifies and regresses border positions more accurately,and improves detection speed significantly.In this paper,we conduct comparison experiments with the original YOLOv3 network on the KITTI dataset and the VEDAI dataset based on the above three improvement points,and then extract part of the VEDAI dataset and add it to the KITTI dataset to obtain the total dataset to balance the proportion of the number of small targets in the total dataset.The comparison experiments are conducted again on the total dataset,keeping the same number of iterations for training.The experimental results show that when these three points are applied to the original network at the same time,the model has a lower miss detection rate for vehicle small targets,a more accurate regression position,and a higher detection speed. |