In recent years,deep convolution neural network(DCNN)has made remarkable progress in semantic segmentation tasks,which greatly improved the accuracy and ef-ficiency of it.For the captured image of automatic driving or driver assistance system,sematic segmentation obtain the spatial orientation of road and the contour of obstacles timely and accurately with the pixel annotation,which has become the mainstream solu-tions for the automatic path planning and obstacle avoiding of automatic driving.For such applications with high real-time requirements,how to improve the segmentation accuracy while maintaining efficient inference speed is a very challenging problem.Many exist-ing semantic segmentation methods that emphasize high-speed inference can not produce high-accuracy segmentation results,so some real-time semantic segmentation research at-tempts to find a balance between speed and accuracy.Although the optimization between speed and accuracy has been studied,how to optimize these two aspects to improve the performance of real-time semantic segmentation is still an open research problem.The main work of this paper is as follows:1.This paper proposes Attentional Residual Dense Factorized Network(AttRDFNet)for real-time semantic segmentation to study the performance optimization problem of real-time semantic segmentation.Specifically,RDFB is designed to obtain the low-level and high-level features of the image by using the dense connection characteristics of the decomposition block,so as to improve the extraction ability of the network to the target features.In order to reduce the computational overhead caused by dense connection,the traditional convolution kernel is factorized into two equivalent smaller convolution ker-nels to accelerate the optimization of high-dimensional features in different convolution layers.In addition,this paper also explores the important role of graininessaware channel and spatial attention modules in focusing on different levels of salient features,and design two effective strategies to make full use of the hierarchical features of the input image.The experimental results on the Cityscapes benchmark dataset show that the AttRDFNet proposed in this paper can obtain more rich hierarchical features.In terms of semantic segmentation precision,it can obtain results comparable to the mainstream methods.In terms of speed,the model greatly optimizes the efficiency of semantic segmentation.2.A Fast Asymmetric Encoder-Decoder Network Based on Context Information Aggregation for real-time semantic segmentation proposed in this paper seeks optimiza-tion in both the accuracy and speed of semantic segmentation.Specifically,in the encoder,a one-shot residual concatenation module is designed to capture the diverse features of all intermediate layers,improving the model’s ability to sense different size targets.In view of the high memory access cost(MAC)caused by the extraction of diverse features,it leads to the problem of low efficiency.The proposed one-shot residual concatenation module only integrates the features of all intermediate residual layers in the last feature map,reducing MAC and improving GPU computing efficiency.On the other hand,in the decoder,the context information fusion module is used to further reduce the complexity of the entire network.The experimental results show that the proposed encoder and decoder network effectively improves the optimization problem between semantic segmentation speed and precision. |