| AbstractIn recent years,with the development of artificial intelligence technology and the opening of relevant policies at home and abroad,autonomous driving has been continuously promoted to commercialization,and has become one of the fastest growing artificial intelligence industries.In autonomous driving technology,environment perception is the premise of accurate positioning,decision control and other technologies.In environmental perception,visual perception has become the most widely used and most mature method because of its low input cost,simple operation and large amount of information.The visual perception system needs to realize pixel-level visual perception of road,sky,house and other background categories to determine the driving area of autonomous driving.At the same time,it is necessary to achieve instance-level visual perception of vehicle,pedestrian and other target categories to obtain accurate coordinate information.According to the above problems,based on convolutional neural network,this paper designed semantic segmentation algorithm for perceiving background categories and object detection algorithm for perceiving target categories respectively.The main work contents are as follows:Firstly,the semantic segmentation algorithm is studied,and a real-time semantic segmentation algorithm MJPUNet based on multi-level feature graph joint upsampling is proposed to meet the requirements of unmanned pixel level visual perception scene.In order to achieve real-time segmentation speed,MJPUNet adopts a lightweight convolutional neural network as the encoder,and replaces the time-consuming and memory-consuming atrous convolution used in the current mainstream semantic segmentation network.A multi-scale feature Map Joint Pyramid Upsamping module is designed to generate high resolution feature maps with rich semantic information by fusing multiple Feature maps of the encoder.Experimental results on Cityscapes data set show that MJPUNet can achieve 91.85% pixel accuracy,43.78% m Io U,and 32.3 FPS.Secondly,the object detection algorithm is studied.According to the requirements of autonomous driving instance level visual perception scenes,the vehicle and pedestrian detection data sets are constructed.7000 images from KITTI,an open source data set for autonomous driving,were used for manual annotation using Label Img.And through the study of YOLO v5 algorithm,a new unmanned instance level visual perception algorithmNGM-YOLO v5 is designed.It uses the ideas and architecture of YOLO v5,and in order to increase the speed of instance-level visual perception,this article inserts Ghost Net into YOLO v5.In addition,the Normalization Block Attention Module(NBAM)has been added to improve the detection accuracy of the Normalization network by adjusting Channel Attention and Spatial Attention to further ignore worthless features.In order to make NGM-YOLO v5 more suitable for practical applications,a Network Adaptation Architecture(NAA)is proposed to select corresponding networks according to the number of targets identified in each frame.It improves the efficiency of feature extraction and hardware utilization without reducing the accuracy.Experiments on KITTI dataset show that NGM-Yolov5 can achieve a maximum of 95.5% m AP and 114.47 FPS. |