Font Size: a A A

Research On Theory And Method Of Visual Object Segmentation

Posted on:2024-10-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:C ShangFull Text:PDF
GTID:1528307373969799Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Visual object segmentation is a fundamental task in the field of computer vision,which can be widely used in autonomous driving,target tracking,human-computer interaction,and safety protection,etc.It has important theoretical significance and application value.In recent years,with the widespread application of artificial intelligence technology in real-life scenarios and the massive growth of data in the internet era,visual object segmentation is no longer limited to simple segmentation tasks of single-modal and fixed classes,but is further oriented to more complex and realistic visual object segmentation tasks with multi-modal and class-incremental.However,in complex scenes,different objects often overlap with each other and are easily confused,information from different modalities often has significant differences and is difficult to correspond to,and new data and new classes will continue to appear.How to accurately identify and segment objects in different modalities and data environments still faces serious challenges.Therefore,it is urgent to research effective visual object segmentation methods,and improve segmentation accuracy and adaptability in complex scenes for the application of visual object segmentation technology in real scenes.Based on the above considerations,this dissertation aims to develop the research on the theory and method of visual object segmentation.To improve the accuracy of visual object segmentation,this dissertation conducts research from two aspects: single-modal object segmentation and multi-modal object segmentation.Moreover,this dissertation further discusses the problem of incremental object segmentation in different modal scenarios.The specific research content and main innovations can be summarized as follows:(1)To solve the problem that it is difficult to accurately distinguish different instances due to the overlap between instances in instance object segmentation,this dissertation proposes an instance feature discrimination method for instance object segmentation.It builds an instance embedding space based on the relationship between instance features and constrains the embedding features of the same instance to be more similar,thereby improving the model’s ability to distinguish features from different instances.To identify the foreground instance more accurately,it further extracts the discriminative features for the foreground instance based on the instance embedding space,which is used to match the area corresponding to the foreground instance and generate a foreground confidence map.Finally,based on the foreground confidence map,the segmentation prediction is optimized to improve the accuracy of instance object segmentation.(2)Since existing instance object segmentation methods are difficult to capture effective contextual information for different instances,causing the instance confusion problem,this dissertation proposes an instance-level context attention method for instance object segmentation.It proposes a new concept of instance-level context,and constructs an instance attention module to generate attention maps focusing on instance-level context.Based on this type of attention map,each instance can obtain more discriminative instance features.Meanwhile,it constructs a spatial attention module to incorporate more spatial information and further enhance feature representation.Moreover,to capture more effective contextual information,a weight clipping strategy is proposed to filter out noise and obtain clearer attention maps,thereby further improving the performance of instance object segmentation.(3)To deal with the catastrophic forgetting problem in class-incremental semantic object segmentation task,this dissertation proposes a transformer-based framework with knowledge distillation focusing on old classes.It first proposes a new transformer framework for class-incremental semantic object segmentation,which only needs to add new class tokens to the transformer decoder for new-class learning.Based on this framework,a new knowledge distillation scheme that focuses on the distillation in the old-class regions is proposed,which reduces the constraints of the old model on the new-class learning,thus improving the plasticity.Moreover,it proposes a class deconfusion strategy to alleviate the overfitting to new classes and the confusion of similar classes,significantly improving the performance of class-incremental semantic object segmentation.(4)To address the semantic misunderstanding problem that existing multi-modal object segmentation methods are difficult to accurately understand the global semantics expressed in text,this dissertation proposes a recurrent semantic comprehension network for multi-modal object segmentation task.It designs a new recurrent network structure to obtain a more comprehensive global semantic understanding through iterative crossmodal semantic reasoning.In each iteration,it extracts relevant visual features guided by language and further proposes language attentional feature modulation to improve the feature discriminability,then proposes a cross-modal semantic reasoning module to perform global semantic reasoning by capturing both linguistic and visual information,and finally updates and corrects the visual features of the predicted object based on semantic information.Moreover,we further propose a cross-modal atrous spatial pyramid pooling module to capture richer visual information from larger receptive fields,and obtain more accurate multi-modal object segmentation results.(5)For the mismatch problem that occurs when the large-scale pre-trained models are adapted to the multi-modal object segmentation task,this dissertation proposes a multimodal object segmentation method based on prompt learning.It builds a prompt-driven framework,which bridges the multi-modal pre-trained model and the image segmentation pre-trained model end-to-end and transfers their rich knowledge and powerful capabilities to multi-modal object segmentation task through prompt learning.To adapt the multimodal pre-trained model to pixel-level tasks,it first proposes a cross-modal prompting method,which acquires more sufficient vision-language interaction and fine-grained textto-pixel alignment by performing bidirectional prompting.Moreover,it further proposes instance contrastive learning to improve the model’s discriminability to different instances and robustness to diverse texts describing the same instance,thereby further improving the performance of multi-modal object segmentation.(6)Facing the problem of class-incremental object segmentation in multi-modal scenes,this dissertation is the first to explore class-incremental multi-modal object segmentation task.It first builds a class-incremental multi-modal object segmentation framework,and proposes a class-semantics-based prompting learning method,which preserves specific class-semantic knowledge for classes in different learning steps and reduces the forgetting of old knowledge by learning a set of independent promptings for classes in each learning step.To preserve the correspondence between different modalities learned by the model,a multi-modal knowledge distillation method is proposed to further improve the model’s memory ability of multi-modal knowledge.In addition,it constructs a classincremental multi-modal object segmentation dataset based on classes,which is used to evaluate the performance of the method for this task.
Keywords/Search Tags:Visual Object Segmentation, Multi-Modal Learning, Incremental Learning, Knowledge Distillation, Prompt Learning
PDF Full Text Request
Related items