| Object detection in autonomous driving scenarios,which is the core technology to ensure the safe driving of autonomous vehicles,aims to predict the image information acquired by the sensor,and infer the category information and location information of each object in the image.However,there are many complex phenomena in the autonomous driving scene,such as uneven illumination,background interference,partial occlusion,object distortion,etc.These phenomena bring great challenges to the identification and localization of current detection algorithms.To deal with these complex phenomena,this paper selects YOLOv5 s in YOLOv5 algorithm as the benchmark,improves the feature extraction,feature fusion and detection head of the network respectively,and constructs a feature extraction network based on dilated convolution,a feature fusion network based on multi-scale information enhancement and a detection head integrated with attention mechanism.The specific research work is as follows :1)Aiming at the problem of false detection caused by background interference and object distortion and the problem of missed detection caused by occlusion,this paper improves YOLOv5 s based on the attention mechanism and multi-scale information enhancement strategy.In terms of attention design,the channel attention is designed to gather semantic information.Then,the spatial attention is designed to capture location information.Finally,a parallel fusion strategy is used to combine the two kinds of attention to avoid mutual interference.In the aspect of multi-scale information enhancement,the cross-scale connection is used to enhance the mobility of feature information in the feature fusion network.Then,the feature-aware recombination operator is introduced to solve the problem that the interpolation upsampling operator cannot make full use of semantic information.Finally,additional learning weights are introduced for feature maps with different resolutions,so that the network can independently select key features for weighted fusion.The evaluation results on the KITTI dataset show that compared with YOLOv5 s,the average missed detection ratio of the algorithm in this paper is lowered by 3.7%,the false detection ration is lowered by 2%,and the m AP is improved by 2.5%.The evaluation results on the BSTLD dataset show that the m AP of the algorithm in this paper is improved by 3.9% compared with YOLOv5 s.2)Aiming at the problem that insufficient extraction of object structure information leads to positioning errors and feature redundancy leads to network bloated,this paper further improves the above algorithm from two aspects of feature extraction and network slimming.In the feature extraction network,this paper first uses the mixed dilated convolution to fully extract the structural information such as the edge and contour of the object,then introduces the residual structure to strengthen the learning ability of the network to the deep semantic information,and finally obtains the effective receptive field of different scales by cascading multiple dilated convolutions,and then captures the multi-scale object information.In the network slimming stage,this paper first applies L_1 regularization to the scaling factor of the BN layer in the network,then jointly trains the scaling factor and the network weight to make the weight of the redundant feature channels in the network approach zero,and finally according to the set global threshold remove these channels.The experimental results show that compared with YOLOv5,the m AP of the final algorithm in this paper is increased by3.1 %,and the model parameters and model size are reduced by 1.4 million and 2.2 MB respectively.3)In order to verify the stability of the algorithm in the autonomous driving scene,brightness,rotation,occlusion and noise test experiments are designed based on real road scene images.The test results show that the algorithm in this paper can still maintain a stable detection effect under the conditions of uneven illumination,object distortion,occlusion and noise interference. |