Font Size: a A A

Research On Several Problems In Weakly Supervised Visual Analysis And Understanding

Posted on:2019-12-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:G Y ZhongFull Text:PDF
GTID:1368330548484739Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
With the popularization of digital cameras and the development of the Internet,the quality of images and videos increases significantly.The visual analysis and understanding has become an impor'tant issue in computer vision.However,most methods require large-scale training sets with ground truth labels,-which limits the generalization ability.In fact,the weak supervision within the visual data is worth mining and exploiting.Also,the previous algorithms with clear structures or efficient solutions In image processing and machine learning are worth expanding.Therefore,we mainly focus on four important problems,i.e.,saliency detection,video object segmentation,video scene parsing,and video sequencing.Saliency detection can provide guid-ance for some weakly supervised tasks.In the remaining three problems,we use several weak supervision in our models,which increases the flexibility of related algorithms.Meanwhile,we extend the traditional techniques,i.e.,partial differential equations,non-local means,and sub-modular function optimization to solve the above problems.Specifically,our main works can be summarized as follows:(1)We propose a Learning-to-Diffuse framework.With the perspective of visual diffusion,we propose an anisotropic diffusion system with adaptive boundary conditions.Both the gov-erning equations and the boundary conditions of the proposed system can be learned from visual data.Meanwhile,we provide a combinational optimization model.We introduce a loss function and an information gain based regularization term to incorporate the perception priors and the discriminative information separately.In addition,we prove the submodularity of the system and provide a simple but efficient numerical scheme.Both saliency detection and object track-ing are addressed within our Learning-to-Diffuse framework.Experimental results demonstrate the effectiveness of our method in visual analysis tasks.(2)We propose a generalized non-local means framework for saliency detection.By intro-ducing the perception priors and using the structure of salient regions,we extend the traditional non-local means to saliency detection.By defining the non-local connections of superpixels,the proposed method can handle the irregular computing units.The proposed method also has a simple but efficient closed-form solution.Meanwhile,the proposed non-local graph reduces the interference of cluttered background.In addition,by fusing the low-level color cues and the high-level objectness cues,we propose an object-level prior which is more robust than the center prior.Experimental results demonstrate the effectiveness of our method in saliency detection.(3)We propose a semantic-based co-segmentation method.The proposed method does not need initial object masks which are manually labeled.By constructing a conditional random field on adjacent frames,we propagate semantic segments bi-directionally.This process generates multiple consistent tracklets and solves occlusion problem.To remove the non-object segments,we collect tracklets from multiple videos as nodes to build a graph.We introduce a submodular function to formulate the similarity between nodes,as well as the appearance,motion,and shape of each node.By maximizing the submodular function,we co-select the optimal tracklets and assign semantic labels.Experimental results show that our method performs well in weakly supervised and unsupervised video object segmentation tasks.(4)We propose a weakly supervised scene co-parsing framework.For the weakly-labeled videos,we use a clustering scheme to exploit semantic relationships between multiple videos.To maintain the spatial-temporal consistency,we use supervoxels as nodes to build a graph.Then we select the representatives of each category via a submodular based clustering process,which reduces the interference from irrelevant videos.To further distinguish scenes and objects,we provide a scene-object classifier as a constraint,which can be generalized to different datasets.In addition,we provide an efficient region-based matching strategy to propagate semantic labels from representatives to pixels.Experimental results show that our method performs well in weakly supervised scene parsing.(5)We propose a weakly supervised video sequencing method.We learn the semantic and motion coherence between clips via a t.wo-stream recurrent neural network,which is more robust than the previous method using human-designed features.As labeling for the entire sequence is subjective,we use the temporal order of real videos as ground truth.This strategy ensures the coherence between adjacent clips and enhances the generalization performance of our model.To improve the overall visual quality,we use a submodular cost function to model the re-ranking process.This strategy makes the NP-hard problem,i.e.,searching optimal orders,can be greed-ily solved.The submodular function incorporates the similarity between clips and dynamics,which makes the final r-esults contain content coherency and match the story plot structure.Ex-perimental results show that our method performs favorably against the state-of-the-art methods.
Keywords/Search Tags:Weakly Supervised, Saliency Detection, Video Object Segmentation, Video Scene Parsing, Video Sequencing
PDF Full Text Request
Related items