| Automatic man-made target extraction is one of the main tasks of remote sensing image analysis systems,aiming at pixel-level classification of ground targets of interest.Target extraction plays an important role in a wide range of applications such as urban planning,geographic information systems,intelligent transportation systems,building change detection,civil-military emergency response,precision agriculture and environmental monitoring.It is a challenging problem to extract complete objects from complex backgrounds due to the variation of target scales,similarity in appearance between neighbouring objects,diversity of imaging directions and complexity of backgrounds.With the development of deep learning techniques,deep convolutional neural networks have made great progress in traditional computer vision tasks such as image classification,object detection,and semantic segmentation.However,due to the huge differences between natural images and remotely sensed images,it is difficult to directly apply deep learning-based semantic segmentation methods to target extraction tasks to achieve ideal results.Therefore,it has theoretical research significance and application value to design deep learning-based target extraction models for remote sensing image characteristics.This paper carries out research on remote sensing image target extraction methods in terms of the network structure of deep learning models,with the following main work:(1)Deep learning-based methods typically utilize U-Net-like networks or feature pyramid networks as their basic structure,yielding good segmentation performance.However,these methods ignore two key issues when integrating multi-layer features:one is the lack of control over information transfer between different layers,and the other is that feature fusion does not take into account differences in the contributions of different feature maps.We propose a novel bi-directional gating network to address both problems simultaneously.The network contains a bi-directional information transfer module for transferring information between multiple layers and controlling the information transfer through a bi-directional attention mechanism.In addition,an adaptive gating module is proposed to generate contribution weights for feature maps at different scales and then perform adaptive fusion based on the contribution weights.The experiments show that the bi-directional gating network is able to segment the complex structure of the target more accurately than other models.(2)To address the problem of low accuracy of existing target extraction models in classifying regions near target boundaries,this paper proposes a target extraction neural network,named discriminative context-aware network,to focus on discriminative high-level contextual features and retain spatial location information.Firstly,a discriminative contextaware feature module is designed to generate a top-level feature map,which not only captures rich image context information but also aggregates contrasting local information at multiple scales.Secondly,a refinement decoder module is used to retain spatial information at the lower layers and enhance the feature representation to obtain accurate segmentation results.The experimental results show that the discriminative context-aware network above has a high segmentation performance for inconspicuous targets and regions near the target boundaries.(3)A multi-scale differential fusion network is designed to address the problem that the discriminative context information extracted by the discriminative context-aware network is smoothed in the feature decoding stage.The network contains a differential fusion module that highlights high-frequency features in the boundary regions of objects and suppresses the interference information from low-level features.Experimental results demonstrate that the idea of differential fusion is capable of improving the performance of target extraction to a certain extent.The paper conducts extensive experiments on building and road extraction benchmark datasets,including the WHU building dataset,the Inria aerial image annotation dataset,the Massachusetts road dataset,the RoadTracer road dataset and a self-built dataset for vehicle extraction from SAR images.The experimental results show that the three methods proposed in this paper fetch good performance on all benchmark datasets and have good model generalization capabilities. |