Font Size: a A A

Remote Sensing Image Scene Semantic Understanding Based On Spatial Relationship Mining

Posted on:2022-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:R X ChenFull Text:PDF
GTID:2480306767963439Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
In the era of high-resolution remote sensing image,the development of accurate and intelligent remote sensing image scene semantic understanding technology is beneficial to efficiently extract and mine information of remote sensing image and provide high-quality knowledge services for various application fields.In recent years,with the support of deep learning technology,many remote sensing image scene semantic understanding tasks have been developed,such as the basic remote sensing image scene classification task and higher semantic level remote sensing image scene captioning task.The semantic understanding of remote sensing images scene requires a comprehensive analysis based on the visual and spatial information of the elements in the image.However,current deep learning-based methods only fuse spatial context information at the pixel level and neglect the modeling and mining of spatial relationship information of elements in image.In the field of computer vision,it has been a research foundation to represent image content with graph-structured data to mine the spatial relationship of elements in image.Scene graph is an important achievement in this kind of research,which aims to represent the objects and relationships of image through the nodes and edges of graph,and is a structured representation with more semantic and expressive ability.With the increasing demand for remote sensing image scene semantic understanding,remote sensing image scene graph generation,as a frontier remote sensing image scene semantic understanding task has gradually attracted the attention of scholars.However,due to the scarcity of relevant datasets and immature methods in remote sensing field,the task of remote sensing image scene graph generation is still in its infancy.In order to comprehensively study remote sensing image scene semantic understanding,this article specifically studies the tasks of remote sensing image scene multi-label classification,remote sensing image scene captioning,and remote sensing image scene graph generation.Considering the shortcomings of existing technologies in modeling and expressing the spatial relationships of elements in remote sensing image scenes,this article proposes solutions to the tasks of semantic understanding of remote sensing image scene at all levels by using the latest deep learning technologies,especially the graph-based spatial relationship mining methods.The main research contents and contributions of this article are summarized as follows:(1)For the task of remote sensing image scene multi-label classification,a multilabel image scene classification method combining convolution neural network and graph neural network is proposed.This method uses the graph-structured data to model the visual elements and relationships of the image scene,combining the perception ability of the convolutional neural network to the visual elements and the mining ability of the graph neural network to the spatial relationship information of the elements to comprehensively complete remote sensing image scene multi-label classification.Experimental results on public datasets show that the proposed method has higher performance.Compared with the AL-RN-CNN method,the F1 Score and F2 Score on UCM multi-label dataset are improved by 0.7% and 1.5%,respectively.The improvements on AID multi-label dataset are 0.5% and 0.9%,respectively.(2)For the task of remote sensing image scene captioning,an image captioning method combining graph neural network and long short-term memory network is proposed.Under the encoder-decoder structure,the fine-grained image coding features are formed based on graph-structured representation and spatial relationship learning by graph neural network,and the natural language description of remote sensing image scene is generated through the decoding of long short-term memory network.Experimental results on public datasets show that the encoder design based on spatial relationship information mining is effective.Compared with the Attention method,the mean metrics on UCM-captions,Sydney-Captions,and RSICD datasets are improved by 0.015,0.013,and 0.092,respectively.(3)For the task of remote sensing image scene graph generation,a knowledge graph-guided large-size remote sensing image scene graph generation method is proposed.Under the framework of object detection and relationship prediction,largesize remote sensing image scene graph is generated by prior knowledge in knowledge graph and multi-feature fusion learning.In addition,due to the lack of relevant datasets in remote sensing field,a large-size remote sensing image scene graph dataset is constructed to provide data support for this study.Experimental results on the newly constructed dataset show that this method can improve the effect of scene graph generation of large-size remote sensing images.Compared with FREQ method,the Recall@1500 of Predicate Classification,Scene Graph Classification and Scene Graph Generation tasks are improved by 5.2%,6.1% and 2.9%,respectively.The mean Recall@1500 are improved by 12%,11.1% and 4.4%,respectively.In this article,remote sensing image scenes semantic understanding is systematically studied based on spatial relationship mining to improve the level of semantic understanding of remote sensing image scenes.Innovative solutions are proposed for remote sensing image scenes multi-label classification and remote sensing image scene captioning,which have a certain research foundation,to further improve the performance.Moreover,this article also explores the frontier task of remote sensing image scene graph generation and proposes a solution for large-size remote sensing image scene graph generation.
Keywords/Search Tags:remote sensing image scene semantic understanding, spatial relationship mining, multi-label classification, image captioning, scene graph generation
PDF Full Text Request
Related items