A Method For Multi-target Automatic Video Object Segmentation

Posted on:2023-01-17

Degree:Master

Type:Thesis

Country:China

Candidate:S Sha

Full Text:PDF

GTID:2568306827967519

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Automatic video object segmentation(AVOS)has played an increasingly important role in recent years,which can be applied not only to video conferencing and autonomous driving that require specified categories,but also to video understanding of everything in the world.However,this task is facing great challenges in instance segmentation and time continuity due to complex backgrounds and changeable appearances of objects.Existing AVOS methods are facing the common problems such as semantic ambiguity between similar objects,missing objects in complex scenes and so on.To solve these problems,this paper improves the ways of spatio-temporal merging in semantic-level.Firstly,the work proposes an efficient end-to-end multi-target AVOS model--a flexible learning positioning-and-modification model based on spatio-temporal bi-branches.The model takes the idea of centroid location in SOLOv2 as the basis,and aims to more accurately obtain the centroid position of independent objects.Due to the spatio-temporal interactions of videos,the network is flexibly designed to concentrate on temporal matching and self-excavation of features,respectively.Using only video frames as inputs,the bi-branches network can independently split the learning of appearance features within a single frame and motion information between frames,while providing a more flexible input for the followings which are spatio-temporal fusion,category prediction,and segmentation prediction.Moreover,after the merging of spatio-temporal context,the AVOS model goes through the semantic optimization module to correctly alleviate problems of accumulated error and semantic overlap generated before,in order to locate the individual objects accurately,which in turn leads more accurate prediction of segmentation masks.What’s more,the model replaces convolution layers with Transformer module as the mechanism of attention in the network,along with the extra input called global embedding which successively improves the self-inter-attention with the fusing of features of frames.This bi-branches model based on Transformer verifies its adaption of occlusion,meanwhile,a comparison between two networks is illustrated.In addition,an ID embedding prediction is added with supervision to the model,intending to improve the temporal continuity of the AVOS task.A number of ablation experiments provide convincing for this method.Compared with other existing methods,this work shows competitive performance based on metrics of J and F for video object segmentation and,furthermore,holds the real-time running speed.

Keywords/Search Tags:

Automatic Video Segmentation, Multi-target Video Object Segmentation, Semantic Feature Learning, Centriod Location, Mechanism of Attention

PDF Full Text Request

Related items

1	Research On Video Object Segmentation Based On Deep Learning
2	Research On Video Multi-object Segmentation Algorithm Based On Multi-temporal And Multi-level Attention Network
3	Research On Video Object Segmentation Algorithm Based On Learning Attention Modulation Network
4	Research On Unsupervised Video Multi-object Segmentation Algorithm
5	Research On Video Object Segmentation Method Inspired By Visual Salienc
6	Video Object Segmentation Based On Spatiotemporal Information Fusion And Attention Mechanism
7	Research On Optimization Algorithm Of Object Segmentation For Blurred Video Data
8	Study On Image/Video Object Segmentation
9	Research On Unsupervised Video Moving Object Segmentation Algorithm Based On Dual-stream Feature Fusio
10	Application Research On Video Object Segmentation With 3D CNN And Attention Mechanism