| Semantic segmentation aims to classify each pixel or point in the scene and divide the objects into several regions with specific semantic categories.It is one of the important research techniques in the field of computer vision.In recent years,the emergence of new technologies such as deep learning and convolutional neural networks has greatly promoted the development of semantic segmentation.Semantic segmentation has a wide range of applications,such as medical image diagnosis,remote sensing image analysis,and autonomous driving.The field of autonomous driving has gained popularity recently.In the process of autonomous driving,semantic segmentation plays a key role.It can segment road scenes for unmanned vehicles in real time.Reliable semantic segmentation algorithms enable cars to understand the surroundings,thus ensuring the safe driving of autonomous vehicles.Although the current semantic segmentation algorithm has shown superior performance,there are still many shortcomings.First,semantic segmentation algorithms mostly use twodimensional image information,which contains limited information,and its general encodingdecoding structure fails to obtain complete context information,which limits the model’s ability to generate more accurate predictions.Second,in order to obtain better performance,the training of these semantic segmentation networks relies on a large amount of pixel-level annotated data.However,it is difficult to perform dense semantic annotation on images.Using a synthetic dataset is an alternative method,and its annotations can be generated using computer graphics techniques.However,due to the domain gap between the real dataset and the synthetic dataset,when applied in the real dataset,the semantic segmentation network trained with the synthetic dataset will encounter performance degradation.Based on this,the major work and achievements are as follows:(1)Semantic segmentation and depth estimation both play important roles in the field of autonomous driving.In recent years,the advantages of Convolutional Neural Networks(CNNs)have allowed these two topics to flourish.However,people always solve these two tasks separately and rarely solve them in a united model.In this paper,we propose a Mutual Encouragement Network(MENet),which includes a semantic segmentation branch and a disparity regression branch,and simultaneously generates semantic map and visual disparity.In the cost volume construction phase,the depth information is embedded in the semantic segmentation branch to increase contextual understanding.Similarly,the semantic information is also included in the disparity regression branch to generate more accurate disparity.Two branches mutually promote each other during training phase and inference phase.(2)Semantic segmentation is often realized by supervised learning with large number of well-labeled images.However,the labeled images are hard to collect in most circumstances,and the common way for unsupervised semantic segmentation is usually implemented by transferring the knowledge from source supervised domain to target unsupervised domain.In order to solve the domain gap between the synthetic dataset and real dataset,we propose a target-targeted domain adaptation approach by focusing on target domain.Our model consists of two components: The Image-to-image Translation(IIT)module to translate the source image to target domain and the Target-targeted Segmentation Adaptation(TSA)module to focus the semantic segmentation on target domain.The IIT module deals with appearance alignment while the TSA module bridges the domain gap at the segmentation map level.In addition,we design a closed-loop learning to promote each other by adding feedback from TSA to IIT.Extensive experiments on two public benchmarks “GTA5-to-Cityscapes” and “SYNTHIA-toCityscapes” demonstrate the effectiveness of our method in domain adaptation of unsupervised semantic segmentation.(3)Unsupervised domain adaptation(UDA)for semantic segmentation recently gains an increasing research attention,which aims at alleviating the domain discrepancy.Existing methods in this scope either simply align features or the outputs across the source and target domains or have to deal with the complex image processing and post-processing problems.In this work,we propose a novel multi-level UDA model named Confidence-and-Refinement Adaptation Model(CRAM),which contains a confidence-aware entropy alignment(CEA)module and a style feature alignment(SFA)module.Through CEA,the adaptation is done locally via adversarial learning in the output space,making the segmentation model pay attention to the high-confident predictions.Furthermore,to enhance the model transfer in the shallow feature space,the SFA module is applied to minimize the appearance gap across domains.Experiments on two challenging UDA benchmarks “GTA5-to-Cityscapes” and“SYNTHIA-to-Cityscapes” demonstrate the effectiveness of CRAM. |