Font Size: a A A

Research On RGB-D Salient Object Detection Guided By Cross-modal Interaction

Posted on:2023-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2558306845991319Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Inspired by the human visual attention mechanism,salient object detection aims to detect the most attractive and interesting object or region in a given scene.In recent years,the development and popularization of depth cameras have provided new ideas for the salient object detection.The introduction of depth maps not only enables the computer to simulate the human visual system more comprehensively,but also provides new solutions for the detection of some difficult scenes,such as low contrast and complex backgrounds,by utilizing the structure information and location information of the depth map.This paper takes RGB-D salient object detection as the research topic,and focuses on exploring the important role of cross-modal interaction in this task.The main research results are as follows:(1)This paper summarizes the existing methods for the first time starting from the key problems(i.e.,cross-modal fusion problem and depth quality perception problem)faced by this task.For the cross-modal fusion problem,different from the existing methods classified by the fusion position,this paper summarizes the existing methods based on the network structure.Especially for the most widely used two-stream structure,in order to guide researchers to rethink the role of RGB and depth information in crossmodal interaction,this paper innovatively further divides it into equal-important bidirectional interaction mode and depth-assisted interaction mode.For the depth quality perception problem,this paper summarizes the existing solutions into two categories,i.e.,depth quality evaluation and depth quality optimization,and discusses the future development of this problem.(2)By considering the different roles of RGB images and depth maps in cross-modal interaction,this paper proposes a cross-modality discrepant interaction network for RGBD salient object detection,which differentially models the dependence of two modalities according to the feature representations of different layers.Specifically,two components are designed to implement the effective cross-modality interaction: the RGB-induced Detail Enhancement module leverages RGB modality to enhance the details of the depth features in the low-level encoder stage;the Depth-induced Semantic Enhancement module transfers the object positioning and internal consistency of depth features to the RGB branch in high-level encoder stage.Furthermore,in order to make full use of the enhanced multi-level encoding features,this paper proposes a Dense Decoding Reconstruction structure to achieve a more efficient decoding process.Extensive experiments on five benchmark datasets demonstrate that our network outperforms 13state-of-the-art methods both quantitatively and qualitatively.(3)Existing Transformer-based RGB-D salient object detection networks usually model the relationship between all RGB image patches and depth map patches through a cross-modal attention mechanism.However,the image patches at the same location are often clearly correlated,therefore this paper first proposes a cross-modal point-aware interaction module based on multi-head attention to fuse the cross-modal features of the same location,and then the Transformer-based decoder explores the long-range dependencies of cross-modal features and learns saliency-related features based on global contrast.Finally,in order to alleviate the block effect and low resolution caused by Transformer,this paper designs a CNN-induced Refinement unit after the Transformerbased decoder.Extensive experiments on five benchmark datasets show that the proposed network achieves competitive results.
Keywords/Search Tags:Salient object detection, RGB-D images, Cross-modality discrepant interaction, Transformer, Multi-dimensional feature fusion
PDF Full Text Request
Related items