| At present,the application area of remote sensing technology is gradually expanding,and many new technical objectives have emerged,such as image segmentation,target detection,target localization,target classification,etc.in the field of remote sensing.The development of all these technologies requires the support of remote sensing image datasets.Therefore,it is necessary to study a method to automatically locate target areas to generate annotations of images on a large scale in order to generate datasets with annotations.Remote sensing image targeting is the process of using remote sensing technology to process and analyze the acquired image data to accurately determine the location information of the target in geospatial space.The target localization can provide scientific basis and decision support for resource management,environmental protection,disaster early warning and other fields.With the increasing popularity of deep learning,the need for building regional localization datasets for training analysis models in remote sensing field image tasks has also gradually become larger,but there is no processing method to generate dataset annotation quickly and effectively,and manual annotation is still required in most cases.To address this problem,this paper delves into the common methods for target localization of remote sensing images and combines the knowledge of target localization and deep learning(such as classification networks,saliency detection networks,etc.),and proposes a twostage approach to automatically generate target localization datasets for largescale remote sensing images with only partial target classification datasets.The main work of this paper is as follows:(1)In this paper,a two-stage processing method is proposed to complete the training of remote sensing image target localization model and mass production of target datasets by only partially classifying and annotating the datasets.In the first stage,the method can obtain the initial annotation of remote sensing image target localization by training the classification model and extracting its feature information,and then the annotation is further optimized by an improved Transformer-based model in the second stage to obtain the final annotation results.(2)A method based on classification network to locate the target position in remote sensing images is proposed,and the automatic generation of datasets with annotations can be realized according to this method.The method is carried out under the framework of classification networks of deep learning,while making corresponding adjustments for the characteristics of remote sensing images themselves,thus proposing a new framework.Due to the complexity of remotely sensed images,the method replaces a dual classifier with a triple classifier to improve the accuracy.Using the airport dataset for training and test validation,this method can locate the location of the target more accurately after the comparison of visual effects.(3)A method based on Transformer saliency model to locate the target position in remote sensing images is studied,according to which the automatic generation of datasets with annotations can be finally realized.To address the problem that the previous method is not accurate enough for positioning,the second stage is used to improve the above results.Since the target has the feature of saliency,Transformer’s saliency model is introduced to further locate the target.An attentional mechanism based on moving windows is introduced to capture more global information.Experiments show that this method has better results than the first method,both in terms of visualization and quantitative metrics for comparison.(4)The concrete implementation of the proposed method is presented in the form of a tool that eventually allows for the automatic generation of datasets with annotations.This enables a simpler implementation during the actual execution(training/testing/evaluation)and facilitates the use by others. |