| Remote sensing scene classification has important applications in military,urban planning and other fields,and most of the existing classification methods are mainly based on deep learning.However,traditional deep learning methods rely too much on labeled data.Once the labeled data is insufficient,the model cannot achieve good generalization performance on test samples.With the development of satellite remote sensing technology,the acquisition of a large number of data is no longer the key factor to hinder the improvement of model performance.The key problem is how to label the large amount of data.The annotation of remote sensing scene images needs to be completed by experts with professional knowledge,thus,it will consume high labor costs.In addition,remote sensing scene images have two characteristics.First,some categories of remote sensing scene images have the problem of low intra-class similarity and high inter-class similarity.Secondly,the target regions with discriminant characteristics differ greatly in scale and cannot learn good enough representations.Few-shot learning can effectively solve the image classification problem in cases with a small number of labeled samples,and uses prior knowledge to transfer the obtained classification ability to new category classification tasks with only a few labeled samples.At present,some scholars have also paid attention to the advantages of few-shot learning and conducted research on few-shot classification tasks.However,they tend to focus on low intra-class similarity and high inter-class similarity,ignoring the importance of multi-scale feature fusion for improving the performance of feature extraction models.Therefore,to solve above problems,this paper attempts to combine contrastive learning with the first stage of the few-shot classification task to form a multi-task model to jointly learn a high-performance feature extraction model.Our main contributions are as follows:Inspired by self-supervised contrastive learning,we propose a few-shot remote sensing scene classification method.The self-supervised contrastive learning method can improve the problem of intra-class diversity and inter-class similarity in some remote sensing scenes.And it can improve the utilization of existing data without any annotation information to assist few-shot learning method to learn the representation of the image.In order to train this multi-task model,we propose a suitable objective function combining contrastive loss and cross-entropy loss.We also select a fixed weight factor through experiments to balance the loss of contrastive learning task and few-shot classification task.Furthermore,we propose a novel attention mechanism,in which the spatial attention module utilizes convolution kernels in different sizes to operate in parallel to fuse multi-scale image features.Compared with other classical few-shot learning methods,it is found that our method achieves the best performance in most cases,and the introduction of self-supervised contrastive learning and attention mechanism makes the model performance significantly improved.Compared with other common attention mechanisms,our attention mechanism is more suitable for remote sensing scene classification.Since few-shot classification is subordinate to supervised learning and the labels of training data are known,to make reasonable use of the label information,this paper based on siamese network proposes a scene classification method using class-specific contrastive learning.Above model are trained by improving the similarity between samples of the same categories and reducing the similarity between samples of different categories.In addition,we balance the contrastive loss and the cross-entropy classification loss with a learnable weighting factor.To fuse multi-scale image features,a feature pyramid is introduced combined with original backbone network.From the experimental results,we can see that feature pyramid network is not as effective as our proposed attention mechanism.Compared with a large number of advanced few-shot classification methods,our model achieves optimal performance in most cases,and class-specific contrastive learning is better than traditional self-supervised contrastive learning. |