Research On Obgect Detection Algorithm Based On Natural Language Description

Posted on:2024-05-02

Degree:Master

Type:Thesis

Country:China

Candidate:Q Huang

Full Text:PDF

GTID:2568307097957149

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

Object detection based on natural language description is an important research direction in visual language tasks,with the aim of identifying the corresponding target regions in the image based on the input language description.This task involves both visual object detection and natural language understanding,playing a crucial role in understanding the current massive multimodal data and fully mining effective information.Currently,the mainstream algorithms for this task independently extract visual and textual features,which results in fixed visual features that cannot adapt to language descriptions.However,in reality,the same target can have different descriptions corresponding to it.The key problem is how to use language descriptions to guide the extraction of visual features and obtain visual features that are consistent with language features.Therefore,this article constructs a dynamic attention module based on language features to guide the extraction of visual features,thereby ensuring the consistency between visual features and language descriptions and enhancing the discriminability of target region features.At the same time,considering the importance of multi-scale features for detection,the dynamic attention module is used to complete the interaction between multi-scale feature levels guided by textual features,thereby selectively collecting multimodal features corresponding to different scales in the image.From the feature visualization results,it can be seen that the proposed dynamic attention module can extract adaptive visual features,and the accuracy has been improved on multiple standard datasets.The detection performance of the current algorithm is greatly limited in the face of long language description input.The number of words in a long sentence is large,and the effective information in the sentence needs to be accurately extracted,and the complex relationship between multiple words or objects involved in a long sentence needs to accurately model the context information.Therefore,this paper proposes a multimodal feature fusion method based on graph convolution context information modeling,which establishes the context relationship between modes and within modes by constructing a graph structure;So as to fully perceive the connection between the objectives;Deeply understand the complex semantics in image and language description.The multi-mode context information is used to guide the process of multi-mode feature fusion,and finally the multi-level hole convolution is used to enhance the multi-mode feature;Perception of semantic information in a wider range;Get more discriminative multimodal features.The algorithm proposed in this paper has achieved significant performance improvements on multiple standard datasets.

Keywords/Search Tags:

Object detection, Natural language description, Dynamic Attention Module, Graph convolution, Context information modeling

PDF Full Text Request

Related items

1	Context-Aware Natural Language Semantic Representation Research
2	Research On Small Object Detection Algorithm Based On Second-order Dynamic Convolution Network
3	Research On Small Object Detection Based On ResNeSt Framework
4	Research On Natural Language Description Generation For Short Video In Self Media
5	Task-Based Dialogue Natural Language Modeling Based On Knowledge Graph
6	Research On Object Detection Method Based On Context Information Fusion And Attention Awareness
7	Research On Person Re-identification Algorithm Based On Natural Language Description
8	Human-Object Interaction Detection Based On Graph Convolution And Semantic Relationship
9	Image Saliency Detection Based On Multi-graph Prior And Multilevel Features Connection Network
10	Research On Multi-Object Detection Algorithm By Convolution Neural Network Based On Context Information