Font Size: a A A

Semantic Understanding Of High Resolution Remote Sensing Images

Posted on:2021-12-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:B Q WangFull Text:PDF
GTID:1482306455963179Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Semantic understanding of remote sensing images is an important research area.The goal of remote sensing image semantic understanding herein is to express the semantic information in the form of human sentences.Not only the objects,but also attributes of these objects can be revealed through sentences.The combination of computer vision and remote sensing image processing area is a research direction with tremendous potential applications,such as remote sensing image retrieval,military intelligence generation,and scene understanding.Intelligence processing of remote sensing images aims to let computer generate semantic information of remote sensing images.Different from traditional remote sensing image tasks that aim to extract word level semantic information,such as remote sensing image classification and remote sensing image object detection,the translation of remote sensing image into descriptive sentences is called remote sensing image caption generation,which is one of the important parts in remote sensing image semantic understanding.Researchers have made an earnest endeavor to develop the semantic understanding of remote sensing images.Nevertheless,the sentences generated by the previous remote sensing image captioning methods are relatively simple.Furthermore,the generated sentences are usually syntactically fixed.The generation of sentences should be more flexible to adapt to various applications.The studies of this dissertation are to enrich the research of remote sensing image semantic understanding.The main contents are introduced as follows:(1)Exploring data set and models for remote sensing image caption generation.Considering the scale ambiguity,category ambiguity,and rotation ambiguity of remote sensing images,the annotation rules are introduced to construct a novel remote sensing image captioning data set.Based on the proposed data set,an encoder-decoder framework is evaluated to present a benchmark.Encoder is utilized to encode the remote sensing image into a feature vector.The feature vector is decoded into a descriptive sentence by the decoder.Two types of encoders are adopted,including handcraft-features and deep learning features.The decoder is recurrent neural networks.During the generation of every word in a sentence,the attention part of remote sensing image features should be different according to the human intuition.Attention mechanism is introduced in to the decoder to generate sentences with higher quality.Experiments are conducted to evaluate the reasonableness of the proposed data set.(2)Multisentence caption generation of remote sensing images.To generate more comprehensive sentences to describe the contents of remote sensing images,multisentence caption generation task is proposed.To obtain the comprehensive information from descriptive sentences,collective sentence representation is proposed.The distance between collective sentence representation and image representation is measured after an embedding matrix in metric learning.During testing process,the distance of the test image representation and all collective sentence representations are computed.Then the collective sentence representation with the lowest distance is parsed into five descriptive sentences.Experiments show that the collective sentence representation can capture more information than single sentence representation.(3)Sound active attention framework for remote sensing image caption generation.Human machine interaction is a final goal of remote sensing image semantic understanding.To make the communication between computer and human more convenient,the sound is introduced to represent the information of an observer.A sound activate attention framework is proposed to generate a descriptive sentence considering the input remote sensing images and the sound together.Three modules based on gated recurrent units are constructed to encode the sound feature,combine the sound feature with image features,generate descriptive sentences respectively.Experiments demonstrate that the sound can provide prior information to generate sentences in line with the observer's expectations.(4)Retrieval topic recurrent memory network for remote sensing image caption generation.Five sentences in remote sensing image captioning data sets are annotated by different persons.To extract determine information from the five sentences,topic words are proposed as a bridge during translation remote sensing images into descriptive sentences.To alleviate the gradient complexity of recurrent neural network,memory network based on one dimension convolution is proposed as a novel decoder.Furthermore,the generated sentences can be changed by editing the topic words.Experiments illustrated the flexibility of sentence generation based on topic words.(5)Mutual attention inception network for remote sensing image question answering.To generate the needed information about a remote sensing image directly,remote sensing image question answering task is proposed.A remote sensing image question answering data set is automatically generated based on the remote sensing image classification data sets and object detection data sets.The task is formed as a multiclass classification.According to the intuition that one question is usually related with some regions of remote sensing images,the attention mechanism is introduced to generate compact features.Similarly,the attention mechanism is also conducted on question to pay more attention on words with semantic meanings.Experiments show that the proposed network can generate correct answer in most conditions.
Keywords/Search Tags:High Resolution Remote Sensing Image Semantic Understanding, Remote Sensing Image Caption Generation, Remote Sensing Image Question Answering
PDF Full Text Request
Related items