| Synthetic aperture radar(SAR)is an active observation system,which plays an important role in the field of remote sensing.The imaging mechanism of a SAR image is fundamentally different from that of an optical remote sensing image,which leads to significant differences in texture features,geometric features,and radiation features.However,the direct application of these algorithms to SAR images has not been as effective as expected.Due to the characteristics of SAR images,the core of this dissertation is to explore the algorithm suitable for SAR images.The detection of ships in a SAR image has arbitrary directionality and multi-scale size,and the surrounding environment of the ship is complex.Ship detection is a very challenging task.Each pixel in the SAR image corresponds to a small area of the real ground.It is necessary to comprehensively consider various terrain environments,noise complexity,geometric distortion,and shadows when performing the terrain classification task.Considering the complexity of SAR data labeling and the scarcity of available SAR data,this dissertation attempts to use unlabeled data for semi-supervised learning to improve the performance of the model.Therefore,this dissertation aims to explore a series of models by starting with neural networks with different structures and analyzing the loss function according to the characteristics of the task.At the same time,this dissertation attempts to use unlabeled data for semi-supervised learning to improve the performance of the model.The specific research contents and innovations are as follows:1.For SAR ship detection tasks,conventional detection networks often encounter problems such as huge detection boxes and the inability to contain complete ships.Essentially,these problems arise because the detection network cannot extract high-level semantic information.In SAR images,ships appear as bright white spots against a darker black background.Based on this image characteristic,a ship detection network based on object-level information and pixel-level information has been proposed.The network maps ship features and background features to an interactive space through a graph convolution module to obtain high-level semantic logical relationships.At the same time,the detection network has a dual-branch structure,which extracts object-level information and pixel-level information about the ship separately.Through a convolutional interaction module at the end of the network,pixel-level information is utilized to enhance object-level information,thereby improving the overall accuracy of the detection boxes.2.Most ship detection models are trained using supervised learning and require a large amount of labeled data.However,annotating SAR data is time-consuming and laborintensive,and labeled data is very rare,while unlabeled data can be easily obtained.It is necessary to improve the accuracy of SAR ship detection by using unlabeled data rationally based on a semi-supervised learning framework.Therefore,a semi-supervised detection network based on consistency learning and adversarial learning has been proposed.The proposed network can be trained through two pretext tasks,namely noise-robust consistency learning and output encoding consistency learning.By adding appropriate noise to the input image multiple times,noise-robust consistency learning unifies multiple results to achieve low-entropy prediction.Output encoding consistency learning maps the output result to an image and uses an encoder to obtain representative features.This representative feature,along with a layer of neural network in the main network,forms an autoencoder structure,which is used to train unlabeled data.3.The following study is about SAR terrain classification tasks.Considering the significant differences between SAR images and optical images,such as complex terrain,fuzzy boundary,and shadows,pixel-based features alone are insufficient for accurate terrain classification.Therefore,a multi-scale autoencoder regularization network based on attention mechanisms was designed.The network uses multi-scale inputs to obtain more comprehensive background information.Meanwhile,the network applies an asymmetric autoencoder structure as a constraint on the classification result,balancing the classification and reconstruction processes.In addition,the network generates weight maps using attention mechanisms to allocate different attention weights to the context,extract more accurate environmental logical relationships,and promote the optimization of the network’s terrain classification performance.4.With the development of science and technology,large-scale SAR images can also be acquired,with pixel counts reaching billions.Previously,classifying terrains in SAR images was done by pixel-by-pixel analysis,resulting in significant time consumption and the wastage of tremendous computational resources.Researchers are gradually focusing on using image segmentation to perform terrain classification tasks.Therefore,a noiseregularized network based on cross-regional context information was proposed.The network is more complex,containing three output heads,which can obtain segmentation results,representation features for each pixel,and reconstructed images.The network structure uses self-attention mechanisms to explore regional logical relationships between different images.The self-attention module can transmit semantic information from different levels of regions.Contrastive learning can excavate pixel-level semantic relationships,achieving the aggregation of same-class pixels in different regions in the feature space.In addition,to improve the robustness of the network,noise regularization is applied to the encoder and segmentation results to impose additional constraints,increasing the accuracy of terrain classification results.5.Annotation of large-scale SAR images is a laborious and difficult task.It is unrealistic to use a large number of high-quality semantic labels to train SAR terrain classification models.It is more practical to use a small amount of labeled data in combination with a large amount of unlabeled data to train the model in a semi-supervised framework.Based on the von Mises-Fisher distribution,an unreliable pixel contrastive learning framework was proposed to optimize segmentation results based on the classic Mean Teacher approach.The improved model can not only output the classification results of each pixel but also output a feature embedding for each pixel.By using a non-uniform sampling strategy for difficult samples and designing a memory bank,the framework achieves better optimization results with a smaller computational cost in each batch.Based on the output of unlabeled data,unreliable pixels are constrained by the von Mises-Fisher distribution in the feature space for contrastive learning.The overall framework is improved to enhance the performance of semi-supervised segmentation models under the requirement of a small amount of labeled data,achieving smoother segmentation boundaries,faster convergence speed,and promoting overall accuracy improvement. |