| Multisource heterogeneous remote sensing data provide multi-dimensional,multi-level and multi-angle spatial and attribute information of the observed scene.Multisource remote sensing data collaborative classification utilizes multiple types of remote sensing data for processing and analysis to achieve more accurate classification of ground objects.Due to the significant differences in data structure and content of different types of remote sensing data,there are great challenges in heterogeneous feature extraction and information fusion,which seriously restrict the performance of collaborative classification.By exploring neural network models and machine learning algorithms,this dissertation focuses on heterogeneous feature learning models and content-aware feature fusion methods to solve feature representation and fusion problems,and self-supervised learning methods specific to multisource remote sensing images are also investigated to solve the sample sparsity problem in the land cover classification task,thus improving the performance of multisource remote sensing image cooperative classification.The main research contents and innovations of this dissertation can be summarized as follows,1.In order to make full use of multiscale features contained in multisource remote sensing data,a hierarchical residual network model with multiscale perception is designed.Hierarchical residual network can effectively extract multiscale characteristics.Considering the differences of spatial and spectral features,multiscale spectral and spatial feature extraction models are developed to extract multiscale spectral-spatial features from multisource remote sensing data.Thus,the investigated approach can take full advantage of shallow detail features as well as deep semantic information for more accurate land cover classification.2.By exploring advantages and potentials of attention mechanism in feature enhancement and information fusion,a cooperative classification method enhanced by attention mechanism is proposed.The hierarchical residual network is employed to extract spatial and spectral characteristics of different receptive fields from multisource remote sensing data,and channel attention and spatial attention mechanism models are utilized to enhance the extracted multiscale features.The feature fusion approach based on attention mechanism conducts the content-aware information fusion of enhanced heterogeneous characteristics.The attention mechanism models are optimized under the unified network framework to obtain optimal attention coefficients,thus heterogeneous features contained in multisource remote sensing data can be effectively enhanced and fused for classification.Compared with the hierarchical residual network,this collaborative classification approach enhanced by attention mechanism can achieve better classification results on benchmark datasets.3.In view of the limitations of single type of neural network in heterogeneous feature extraction,heterogeneous neural networks composed of convolutional neural network and transformer structure are investigated in this dissertation.In the proposed deep hierarchical transformer model,the spectral transformer structure is exploited to extract sequence spectral features from hyperspectral image,and the hierarchical spatial characteristics are extracted from multisource remote sensing data through convolution operation as well as transformer structure.Taking advantages of the unique feature representation capability of transformer structure,the feature fusion model based on cross attention mechanism can adaptively and dynamically fuse heterogeneous characteristics.A heterogeneous feature learning network architecture is further developed,which extracts local spatial features as well as sequence spectral characteristics by CNN and transformer structure,respectively.The heterogeneous feature coupling module is utilized to transform and exchange information between feature maps and embedding features,and hierarchical characteristics fusion and collaborative classification are realized within the multi-stage network architecture.Compared with other deep neural networks,this heterogeneous neural network model has stronger feature representation abilities,which can achieve higher classification accuracy,especially on more complex datasets.4.The generative self-supervised pre-training and classification paradigm specific to multisource heterogeneous remote sensing data is investigated,which includes self-supervised feature learning in the designed pretext task as well as the downstream fine-tuning classification task.The self-supervised pre-training model consists of asymmetric encoder-decoder structure,in which the deep encoder extracts high-level key features from multisource heterogeneous remote sensing data and task-specific decoders are employed to reconstruct original remote sensing data.In order to further improve the feature learning performance,the cross-attention layers are utilized to exchange information contained in heterogeneous features,thus learning more complementary information from multisource remote sensing data.In the fine-tuning classification phase,the trained encoder as well as cross attention layers is utilized as unsupervised feature extractor,and single-layer transformer classifier is designed for land cover classification.The learned characteristics are combined with corresponding spectral information for classification,thus effectively utilizing unlabeled samples to improve the feature learning and classification performance.Compared with typical supervised deep neural networks and feature learning methods,the proposed method using a single layer transformer classifier can achieve better classification performance.5.To resolve the classification problem of deep neural networks under a limited number of labeled samples,the multitask contrastive learning model for multisource heterogeneous remote sensing data classification is proposed.Due to the fact that multisource remote sensing data contain rich and complementary information of the observed scenes,multiple complementary views are constructed from multisource remote sensing data,and each view is employed for contrastive learning to obtain the corresponding feature.Aiming to obtain more robust features,the multitask learning strategy of weight sharing is utilized to train the feature extraction network between different views,and learned characteristics from multiple views as well as corresponding spectral information are fused as the final feature representation of samples.Since contrastive learning utilizes the inherent attributes of data as self-supervised learning objective,a large number of unlabeled samples are employed for feature learning in the training process,thus the discriminative capability of samples is effectively improved in the learned feature space.In the classification procedure,a small number of labeled samples are selected as training samples,and the support vector machine is employed as classifier for land cover classification.The proposed approach can learn meaningful feature representation from a large number of unlabeled samples,and its small sample classification results on benchmark remote sensing datasets are superior to typical supervised,semi-supervised,and feature learning approaches. |