Font Size: a A A

Research On Several Key Problems Of RGB-D Salient Object Detection

Posted on:2023-03-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:N C HuangFull Text:PDF
GTID:1528306917979819Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Salient Object Detection(SOD)is one of the fundamental tasks of computer vision,which aims to identify the most visually conspicuous objects or regions in a given image.It is an important pre-processing step for a variety of computer vision applications,including image classification,image segmentation,and so on.Until now,tremendous efforts have been made to detect the salient object in a given image.However,most of these SOD models are designed for visible light images(i.e.,RGB images).Despite having achieved profound progress,these RGB SOD models still cannot work well for some challenging scenarios.This may be due to the fact that,although RGB cameras can effectively capture the details,appearances and colors from the scene,they squeeze the 3D spatial information into 2D images,which inevitably lose abundant 3D spatial information about the real world.Different from RGB images that mainly provide spatial appearances of the scene,depth images provide affluent spatial structures and 3D layout information about the scene,which are robust to light and color changing,and can complement RGB images.Accordingly,benefiting from the complementary information between RGB-D images,more desirable salient object detection results may be obtained.Therefore,RGB-D SOD has an important research significance and many applications.On top of the four key issues of RGB-D SOD,i.e.,image quality,multi-modal information fusion,context information extraction and exploitation,and lightweight RGB-D SOD model design,this dissertation conducts a series of research on RGB-D salient object detection with the aid of some new technologies,such as convolutional neural network,lightweight algorithms,attention mechanism and so on,from the fields of deep learning,image processing,pattern recognition and machine learning.The main research contents are as follows:(1)We propose an RGB-D salient object detection algorithm based on the discriminative unimodal feature selection and fusion.When one of the input RGB and depth images is lowquality,existing models mainly use some feature selection based strategies to handle those interfering information within low-quality images,which aims to preserve those highly discriminative unimodal(RGB/depth)features within high-quality image regions and discard those non-discriminative ones within low-quality image regions by generating some feature weights.However,existing feature selection based strategies tend to treat those image regions related to salient objects as high-quality ones,and select discriminative features from these regions for RGB-D salient object detection.This ignores the fact that the features within the background regions can also provide some useful cues to identify salient objects,thus obtaining sub-optimal results.To address such an issue,the proposed algorithm treats the image regions that can provide cues for identifying salient objects as high-quality ones.Accordingly,it selects the features that help to identify salient objects,regardless of their saliency,thus achieving better results.For that,it designs a semantic-guided modality-weight map generation module,which can identify those high-quality image regions and select their corresponding discriminative features for fusion,thus better addressing the issue of image quality.(2)We present an RGB-D salient object detection algorithm based on the coordination of cross-modal and unimodal features.Existing models mainly employ the fused cross-modal features for detecting salient objects.However,when one of the input RGB and depth images is low-quality,they cannot fully eliminate those interfering information from those lowquality images.As a result,the discriminabilities of those fused features will be inevitably degraded to some extent,when one of the input images is low-quality.While,existing RGBD SOD models only employ the fused cross-modal features for deducing the final saliency maps,which will lead to sub-optimal results.To address such an issue,this algorithm proposes to simultaneously use the fused cross-modal RGB-D features and the unimodal(RGB and depth)features for RGB-D SOD.By doing so,when one of the RGB and depth images is low-quality,the unimodal features from the images with high qualities can complement the fused features.To this end,this algorithm first designs a multi-branch feature fusion module to simultaneously preserve cross-modal features and unimodal features.Then,it designs a feature selection module to select those discriminative features from the fused cross-modal RGB-D features and the unimodal(RGB and depth)features,thus better handing the noise information within low-quality images.(3)We propose an RGB-D salient object detection algorithm based on modality interactions and cross-level redundancy reduction.First,for multi-modal information fusion,existing strategies mainly explore the local interactions between unimodal RGB and depth features,and ignore their global interactions,thus leading to sub-optimal results.To address such an issue,the proposed algorithm designs a modality-aware and scale-aware feature fusion module,which explores the global relations between RGB and depth features,thus better capturing their complementary information.Secondly,for the extraction and exploitation of context information within fused cross-modal features,existing modules mainly focus on aggregating multi-level fused features for capturing their complementarity,ignoring their abundant redundancy information,thus leading to sub-optimal results.To address such an issue,the proposed algorithm progressively integrates multi-level complementary information and selectively reduces their redundant information by generating some preliminary boundary maps.By doing so,the proposed algorithm effectively captures the context information within multi-level features.(4)We present an RGB-D salient object detection algorithm based on bilinear fusion and saliency prior information.First,for multi-modal information fusion,existing cross-modal feature fusion strategies are mainly based on some linear functions,e.g.,addition and concatenation,which can well capture those linear relations between RGB and depth features,while cannot effectively capture their no-linear relations,thus leading to ineffective crossmodal information fusion.To address such an issue,the proposed algorithm designs a multimodal feature interaction module,which,on top of existing linear fusion strategies,further introduces a novel no-linear fusion strategy.By doing so,it can combine the linear and nolinear feature fusion strategies for capturing the liner and no-linear relations across unimodal RGB and depth features,thus better capturing their complementary information.Secondly,existing saliency prediction modules ignore that the detail information from the background regions of low-level features could be interfering information,which further leads to insignificantly context information extraction and exploitation.To address such an issue,this algorithm presents a saliency prior information guided fusion module,which tries to employ some saliency prior information for guiding the fusion of cross-modal features at different levels and eliminating the impacts of the interfering information within background regions,thus effectively capturing context information within multi-level features.Moreover,existing models only employ the simplest way,i.e.,one convolutional layer,to deduce the saliency maps which cannot fully exploit the captured information.For that,this algorithm further designs a saliency refinement and prediction module which first refines the foreground features and the background features and then uses those refined features to predict the final saliency maps,thus achieving better results.(5)We present a middle-level feature fusion based lightweight RGB-D salient object detection algorithm.Existing lightweight RGB-D salient object detection models usually simplify each module of existing two-stream structure or single-stream structure for reducing parameters and computational costs.However,existing two-stream structure and singlestream structure are not well suited for the design of lightweight RGB-D salient object detection models.To address such an issue,this algorithm proposes to specially design a novel lightweight middle-level fusion structure for RGB-D salient object detection,which performs cross-modal feature fusion only on one certain level of features rather than on all of the levels,thus effectively reducing parameters and computational costs.Moreover,a multi-modal feature fusion module and a cross-level feature fusion are carefully designed for the proposed middle-level feature fusion structure to effectively capture the cross-modal and cross-level complementary information for compensating the performance degradation of our proposed RGB-D SOD model caused by parameter reduction.By doing so,the proposed algorithm has the smallest model size and runs at a real-time speed,thus having many potential applications in some source-limited devices for detecting salient objects.Numerous experiments on four benchmark datasets have demonstrated the superiorities of the proposed 5 RGB-D SOD algorithms over the state-of-the-arts.
Keywords/Search Tags:RGB-D salient object detection, Multi-modality image processing, Deep learning, Convolutional Neural Network
PDF Full Text Request
Related items