| Remote sensing semantic segmentation has always been an important research direction,and its goal is to classify each pixel in the image.This task plays an important role in applications such as artificial feature extraction,smart cities,intelligent agriculture,and drones.In recent years,due to the advancements and utilization of deep learning technology,a new era of technological progress has emerged,people have used deep convolutional networks to extract high-level semantic information and enhance detail information of target edges by learning local spatial features.However,current methods suffer from problems such as detail loss,blurring,and inaccurate localization.Spatial detail information is generally contained in lower-level features,while higher-level semantic features often lack this information,which can easily lead to incorrect target classification.In order to make the model consider both high-level semantic information and low-level detail information,one class of solutions use a dual-branch approach to learn these two types of information separately,and finally fuse them to output the result feature map.Although this dual-branch method has some effect on detail extraction,due to the lack of high-resolution positioning and effective interaction between branches,the low-level detail features are mixed with too much noise,resulting in little improvement in the results.Therefore,this paper will focus on how to efficiently learn the details and edge information of images.Based on the above problems,this paper mainly studies the problems of detail and local relationships in remote sensing images,and its main innovations are as follows:1.A two-branch model,Bi DNet,is proposed to learn detail information,which adds a high-resolution retention branch to the existing single-branch segmentation network.branch is added to the original single-branch segmentation network,which can accurately extract the spatial details by keeping the feature map in high resolution and accurately locate the target location.This branch extracts the spatial details by keeping the feature map at high resolution,and locates the target position precisely.The initial sharing of features and parameters in the model is used to reduce the number of model parameters.Experiments on several remote sensing image datasets show that the Bi DNet model outperforms other similar network models under the same experimental setup.Compared with the results of the base model,Bi DNet shows a great performance improvement.2.The model BMANet is proposed for multi-level detail enhancement based on semantic segmentation.we design a Transformer module for multi-scale detail enhancement,whose purpose is to learn multi-level detail information through different underlying stages,so as to enrich the local spatial details and effectively guide the encoding part in local branching learning.The final part of the model uses the fusion module to efficiently combine the high-level semantic and multi-level detail information to reduce the noise brought by the underlying details to the high-level features.Experiments on remote sensing image datasets show that the BMANet network model works better and demonstrate the effectiveness of the multi-level detail enhancement module and the feature fusion module.3.A FLA module for mitigating large scale local misclassification is proposed,whose main idea is to use the attention mechanism to induce Transformer model to learn local features,which tends to focus more on local features while learning the global context,so as to reduce the local large range misclassification in images.Experiments on remote sensing datasets Potsdam and Vaihingen show that Trans UNet models using FLA’s module outperform other Transformer-structured models for the same experimental setup,and the effectiveness and generalizability of the module has been demonstrated in several other base models. |