| Semantic segmentation is a fundamental task in the field of artificial intelligence computer vision,where neural networks are used to learn end-to-end segmentation results and thus achieve pixel-level classification of objects.Semantic segmentation based on urban scenes is a core algorithm technology for application areas such as unmanned vehicles and robots,whose networks combined with attention mechanisms can select information that is more critical to the current task target and obtain more accurate segmentation results.However,the traditional attention mechanism is computationally intensive and time-consuming,and cannot perform its full performance in the face of limited hardware conditions on the mobile side in practical applications to meet the application requirements.Meanwhile,with the rapid development of sensors such as cameras,the resolution of acquired image samples has been increasing,and some semantic segmentation models cannot adapt well to the high-resolution image segmentation task,ignoring the detailed information in the image,resulting in insufficient final segmentation accuracy.In this thesis,from the above problems,the attention mechanism and high-resolution image extraction are studied in depth,and the main research works and innovations are as follows.1.In this thesis,we propose Multidimensional Attention Network(MANet)for the problem of imbalance between computational resource consumption and accuracy of the attention mechanism in the model.First,based on the shape characteristics of objects in urban scenes,this thesis designs Strip Partitioned Dimensional Attention(SPDA),which uses strip pooling instead of traditional square convolution and combines the dimensionality reduction operation to extract long-distance contextual semantic information by dimensionality to reduce the model computation.Based on this,it fuses the attention on channel domain and spatial domain to form a lightweight Mult-attention Fusion Module(MAFM)that can be stacked and disassembled to extract feature information in all directions and further improve the model accuracy.Finally,it inserts the module into the Res Net-101 backbone based encoding-decoding network to build MANet,which guides the semantic fusion of high and low layers,corrects the feature map edge information and complements the prediction details.Experiments show that the module has strong flexibility and generalization ability,and cuts about 90% of the number of parameters as well as 80% of the computation compared with the same type of attention mechanism,and the segmentation accuracy still achieves a stable improvement.2.In this thesis,we propose an Attention-weighting Codec Network(AW_CNet)based on the problem that the model ignores the detailed information in the high-resolution input.First,it compresses the feature map using average pooling and maximum pooling to extract the features on the channels.Then,it constructs the Attention-weighting module(AWM)on the decoder to calculate the weights of pixels on the channel dimension and assign them to the high-level semantics and the low-level semantics,so that the low-level semantics can guide the recovery of the high-level semantics in the decoder and further repair the detailed information in the decoding process.Finally,it replaces the decoder part of the traditional semantic segmentation network with AWM to construct AW_CNet,which enhances the attention to the low-level semantic details in the network decoding process.Experiments show that the overall segmentation accuracy of AW_CNet is improved by 0.73%,and it works better for target segmentation with complex edge details without any significant decrease in accuracy in other categories.3.In this thesis,we propose an attention enhancement model based on codec architecture by combining the use of MAFM module and AWM module.The model parallels the MAFM with the Atrous Spatial Pyramid Pooling(ASPP)module in the encoder to improve the ASPP local focus problem.Meanwhile,this thesis uses MAFM and AWM on the decoder to let the detailed information in the high-resolution feature map guide the recovery of images in the decoder.Experiments show that the model achieves significant improvement in segmentation on striped objects,fine objects,and objects with complex edges,and is more flexible than the traditional model due to the modularity of the structure. |