Image semantic segmentation can realize computer analysis of complex scenes by classifying images at the pixel level.It has always been a research hotspot and difficulty in the field of computer vision.In recent years,image semantic segmentation has been widely used in different scenarios including autonomous driving,indoor navigation,and assisted medical care,and has attracted more and more researchers’ attention.For autonomous driving,the semantic segmentation technology in the street scene has important guiding significance and research value to it.The current methods for implementing end-to-end semantic segmentation are mainly based on deep convolutional neural networks,but most of the algorithms rely on a huge amount of parameters and calculations,which are limited in practical application scenarios.In order to meet the real-time and precision requirements of autonomous driving,this article aims to design an efficient,accurate,lightweight semantic segmentation algorithm for street scene images.The specific research content can be divided into the following three parts:(1)A lightweight semantic segmentation network EPANet with hierarchical feature pyramid attention is designed.The existing feature pyramid pooling module(PPM)and atrous spatial pyramid pooling(ASPP)techniques are improved and their lightweight implementations are designed.It is proposed that the attention refinement branch(ARB)makes global feature guidance to the feature pyramid and combines them to form two attention modules of the feature pyramid(PWA and ASWA).A hierarchical depth feature fusion strategy is designed to retain and fuse the category information and location information contained in feature maps at different resolutions to the greatest extent.The proposed two network models,EPANet-PWA and EPANet-ASWA,were verified on Cityscapes data sets with a resolution of 1024×2048.The mean Intersection over Union was74.57 and 75.38,respectively.On a TITAN Xp graphics card,the reasoning speed reached21.7 FPS and 20.5 FPS,and the storage space was only 10 MB and 14 MB.(2)Based on the network framework of the first part,a segmentation network EPANet-MAM with mixed attention cascade is constructed.After multi-scale context information is aggregated,feature maps lack the correlation expression between distant pixels.Therefore,semantic correlation is modeled from the spatial and channel dimensions of feature maps respectively.A lightweight mixed attention module(MAM)is designed to further improve feature representation.At the same time,a double assistant supervisory layer is designed to optimize the feature representation of the high and low levels of the network.The performance of the two proposed network models,EPANet-PWA-MAM and EPANet-ASWA-MAM,was evaluated.The mean Intersection over Union was 76.51 and76.67,respectively.The reasoning speed was 18.0 FPS and 17.3 FPS,and the storage space was 15 MB and 19 MB.(3)A lightweight semantic segmentation network HRPAnet based on high-resolution representation is designed.During feature coding,maintaining high resolution representation can capture more location information of targets and enhance the locating ability of segmentation network to targets.Based on the network architecture of high-resolution representation,a multipath parallel feature pyramid attention structure is designed to improve the feature representation.The feature aggregation layer with sparse expression ability is designed.The superiority of the network is proved by experiments. |