Research On Remote Sensing Image Captioning Based On Multi-level Attention And Visual Adaptation Mechanism

Posted on:2022-07-13

Degree:Master

Type:Thesis

Country:China

Candidate:Y Xu

Full Text:PDF

GTID:2492306572981899

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

Remote sensing image captioning task aims to allow the computer to distinguish and comprehend the content of the image and generate homologous description sentences automatically,which combines the two fields of computer vision and natural language processing.It plays a key role in many application scenarios of remote sensing technology,such as military intelligence generation,information retrieval,resource investigation,disaster detection,etc.Different from image comprehending tasks such as recognition and object detection,image captioning not only needs to identify objects and attributes in the image,but also needs to establish the relationship between them,and generate natural language description sentences in accordance with human norms.Benefiting from the vigorous development of artificial intelligence,the effects of deep neural networks in feature extraction have greatly ameliorated the quality of the generated description sentences.However,the remote sensing image have the problems of large scene imaging,complex and diverse background,multi-scale,rotation characteristics and semantic ambiguity,which further increase the difficulty of image captioning.In the thesis,based on the encoder-decoder,a remote sensing image captioning model MLVA-NET based on multi-level attention and visual adaptation is proposed to solve the problems of difficult semantic understanding and multiscale of remote sensing image.The main work includes:Aiming at the multi-scale and category ambiguity of remote sensing images,the thesis employs a multi-level attention module in the encoder to optimize the visual features of images extracted by CNN,and obtain more abstract deep image features.It uses spatial and channel attention mechanisms to learn features of specific locations and different scales of the image to improve the performance of the model.Aiming at the problem of the loss of visual information of the convolutional layer in the propagation stage of CNN makes it difficult for the network to learn the complete semantic information of the image.The thesis designs a contextual attention module in the encoder to incorporate multi-level features,which integrates low-level and high-level features of CNN.It can achieve the information complementation between the local feature and the global feature from the semantic information,and increase the diversity of image description sentences.Aiming at the problem of semantic ambiguity between remote sensing image visual features and text attribute information,the thesis proposes a visual adaptive LSTM decoder,which employs a visual sentinel mechanism to achieve the adaptive selection of visual information and contextual information for generating more discriminative description sentences,which improves the accuracy of image description sentences.Finally,the thesis verifies the effectiveness of the proposed MLVA-Net model from the perspective of quantitative and qualitative,through ablation experiments,comparative experiments and visualization results.Five commonly used metrics of image captioning evaluate the MLVA-Net model on four datasets of UCM-Captions,Sydney-captions,RSICD,and NWPU-captions.The experimental results demonstrate that the performance of the proposed MLVA-Net has strong robustness and generalization.It can generate more discriminative description statements from remote sensing images with complex backgrounds.In addition,the multi-level attention is used to increase the attention to smaller areas,and the visual sentinel realizes the semantic alignment between image and text.It can generate more accurate and richer description statements of remote sensing images.

Keywords/Search Tags:

remote sensing image captioning, multi-scale, category ambiguity, attention mechanism, visual adaptation

PDF Full Text Request

Related items

1	Interactive Semantic Awareness For Understanding And Description Of High Resolution Remote Sensing Image Content
2	Remote Sensing Image Captioning Based On Deep Learning
3	Multi-scale Deep Network Based On Visual Attention For Optical Remote Sensing Image Classification
4	Remote Sensing Image Captioning Based On Semantic Priori Information
5	Research On Remote Sensing Image Retrieval Method Based On Multi-scale Self-attention Feature Fusion
6	Remote Sensing Image Captioning On Deep Learning And Attention Mechanism
7	Remote Sensing Image Target Detection Based On Multi-scale Feature Fusion And Visual Attention Mechanism
8	Research On Remote Sensing Image Fusion Algorithm Based On Residual Network And Attention Mechanism
9	Remote Language Image Natural Language Description Generation Model Based On Attention Mechanism And Deep Learning
10	Remote Sensing Image Fusion Based On Multi-Morphology Attention Mechanism