Font Size: a A A

Research On High-Performance Instance Segmentation Based On Deep Learning

Posted on:2023-09-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:R F ZhangFull Text:PDF
GTID:1528307316951059Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Instance segmentation aims to identify and segment all potential objects in an image and is one of the most challenging tasks in computer vision.Specifically,instances refer to all individuals that are distinct from each other.Unlike object detection,which uses rectangular bounding boxes to roughly represent the locations of objects,or semantic segmentation that typically identifies semantic classes of image pixels.Instance segmentation requires both pixel-by-pixel segmentation and instance differentiation,and it enables various visual applications like autonomous driving,intelligent security,and auxiliary medicine,to name a few.In recent years,the research work on deep learning-based instance segmentation has significantly improved in both performance and efficiency.Currently,the mainstream multi-stage instance segmentation process is 1)to provide a candidate region using detection techniques,2)to perform pixel-wise prediction within this regional representation,and 3)to filter the redundant outputs in the post-processing stage.However,considering the complex real-world environment,varying object sizes,and limited computational resources,there are still many problems in the multi-stage algorithms.(1)Balancing model accuracy and efficiency leads to a struggling problem.(2)When the first-stage detection method cannot accurately locate objects,the subsequent model fails to segment well.(3)A post-processing algorithm is needed to process redundant predictions,introducing heavy time overhead.(4)It is difficult to segment moving objects when they are obscured.In response to the above challenges,this thesis conducts research in three dimensions: representation design,algorithm modeling,and practical application.First,we research the mask representation modeling method,which is prepared for highprecision instance segmentation frameworks.Then,we focus on the construction methods of high-efficiency instance segmentation without pre-processing or postprocessing algorithms.Furthermore,the research on segmenting moving objects with reasonable networks and optimization strategies is carried out.This thesis focuses on how to build a high-performance instance segmentation algorithm with high accuracy,high efficiency,and high versatility in complex real scenes within a more simple and efficient segmentation process.The main contributions are as follows:(1)To solve the problem that the high-complexity mask modeling approach can only predict coarse mask results,this thesis proposes a flexible and effective algorithm termed Mask Encoding.The core idea is to encode the high-resolution structured mask into a compact representation that shares the advantages of high-quality and lowcomplexity by mining pixel-to-pixel correlations.Meanwhile,this mask representation can be easily integrated into most two-stage instance segmentation frameworks for optimization,achieving both performance and efficiency gains by converting the explicit pixel-wise segmentation task into an implicit vector regression problem.(2)To solve the problem that the “detect-then-segment” two-stage segmentation framework is limited by object detection and unstable inference speed,this thesis proposes a simple single-stage segmentation framework,termed mask encoding based instance segmentation(MEInst).The algorithm uses an end-to-end fully convolutional neural network to learn both region-level and pixel-level features in parallel,and predicts all the object masks at one time,making the speed stable no matter how many objects are present.Besides,the method introduces a simple enhanced module to fill the performance gap between single-stage and two-stage segmentation algorithms while ensuring computational efficiency.(3)To solve the problem that instance segmentation algorithms tend to generate redundant instance masks in the inference stage and rely on complex post-processing techniques,this thesis proposes a sparse instance recognition framework called Sparse R-CNN.Sparse R-CNN presents learnable proposals to represent the scene’s object position distribution,leverages the dynamic instance interaction scheme to extract features,and directly segments potential objects without NMS algorithms.At the same time,the method replaces the human-defined “many-to-one” training matching pipeline with the model self-learning “one-to-one” label assignment mechanism,avoiding the tedious manual tuning process.Moreover,the proposed sparse segmentation algorithm achieves satisfactory performance even in crowded scenes.(4)To solve the problem that moving objects cannot be accurately recognized and segmented due to excessive lighting and obstacle occlusion,this thesis proposes an instance segmentation system based on inter-frame consistency.The framework fully exploits the similarity of the identical object in shape and color between adjacent frames and enhances instance-aware features in the relevant region of the current image by using the instance vector generated in the previous frame.The temporally-correlated interaction design significantly improves the segmentation and tracking ability of the model when processing those objects in continuous motion in real scenes with almost no increase in computational complexity.In summary,this thesis aims to break the stereotype that the instance segmentation task requires a complex multi-stage segmentation process,and proposes a new instance segmentation framework.This thesis demonstrates that,without high-complexity spatial instance mask representations,without high-performance pre-processing object detection algorithms,without highly-engineered post-processing techniques,instance segmentation algorithms can still achieve high accuracy results with high efficiency.Our methods are heuristic for the research community and significant in terms of application prospects for the industry community.
Keywords/Search Tags:instance segmentation, mask encoding, single-stage, sparse instance recognition, inter-frame object consistency
PDF Full Text Request
Related items