Font Size: a A A

Research On Video Object Segmentation Algorithm Based On Temporal Frame Information Fusion And Mask Refinement

Posted on:2024-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:F C LiFull Text:PDF
GTID:2568307061481764Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Video object segmentation is one of the important research directions in computer vision,and has a wide range of applications in the fields of intelligent medical care,autonomous driving and video coding,etc.According to the degree of human participation,video object segmentation application scenarios can be divided into two categories:unsupervised scenarios and semi-supervised scenarios,this work focuses on the study of semi-supervised video object segmentation algorithm,that is,given the first frame of the video target mask conditions,segment the corresponding target in subsequent frames.With the development of deep learning,many excellent algorithms based on deep learning have been proposed one after another,which has promoted the progress of video object segmentation technology.However,because video object segmentation algorithms face difficulties such as occlusion,similar object interference,changes in target appearance and difficulty in identifying small targets,further in-depth research on video object segmentation algorithms is needed.Although some current algorithms can effectively solve the problems of occlusion and similarity between instance objects,there are still the following three problems:1)Some algorithms fuse inter-frame information by mask propagation,but do not fully utilize historical frame information,which makes it difficult for network models to cope with complex scenarios;2)The algorithms based on the memory network indiscriminately updates the frame information to the memory pool,resulting in redundant frames in the pool and an increase in the amount of computation;3)The mask refinement methods of most algorithms are coarse,resulting in the blurred edge of the final generated mask.In order to solve the above problems,the main work of this thesis is as follows:(1)Aiming at the problems such as the video object segmentation algorithm based on mask propagation does not make full use of the information of historical frames,and the mask refinement method is coarse,resulting in blurred mask edges,a video object segmentation algorithm based on temporal frame context information fusion and feature enhancement is proposed.First,in order to make full use of historical frame information,a temporal frame residual fusion module is proposed to adaptively fuse historical frame information.Second,a spatial cascade mask refinement module is established to enhance the spatial information of the shallow features of the backbone network and refine the edge information of the fusion features.The experimental results show that proposed algorithm achieves the performance(J&F)of 87.4%,76.6%and 68.1%on DAVIS2016,DAVIS2017 and You Tube-VOS18respectively and the segmentation speed(FPS)also meets the real-time requirements,reaching 26FPS on DAVIS2016 validation set.(2)To solve the current popular memory network-based video object segmentation algorithm unselectively updates frame information into the memory pool,resulting in redundant frame information in the memory pool;at the same time,the fusion method of deep features and shallow features is also rough,resulting in problems such as blurred edges of the generated masks.a video object segmentation algorithm based on dynamic perception update and feature fusion is proposed.In order to reasonably utilize the historical frame information,a dynamic perception update module is proposed to selectively update the segmentation frame mask.Meanwhile,a mask refinement module is established to enhance the detail information of the shallow features of the backbone network.This module uses a double kernels fusion block to fuse the different scale information of the features,and finally uses the Laplacian operator to sharpen the edges of the mask.The experimental results show that on the public datasets DAVIS2016,DAVIS2017 and You Tube-VOS18,the comprehensive performance of the proposed algorithm reaches 86.4%,78.8%and 71.0%,respectively,and the segmentation speed reaches 15FPS on the DAVIS2016 validation dataset.(3)Aiming at the problems of video object segmentation algorithm based on memory network uses standard convolution to to reduce the number of the channels,resulting in the loss of the target channels and the mask refinement method is coarse,resulting in blurred mask edges and unclear final mask,a video object segmentation algorithm based on double branch channel selection and edge sharpening is proposed.Firstly,in order to reduce the loss of the target channel,the two-branch channel attention fusion method is used to select the target channel adaptively.Secondly,a edge sharpening module is established,which makes the difference between the rough mask and the rough mask after global average pooling,so as to reduce the generation of false foreground pixels and make the final mask edges sharper.The experimental results show that on the public datasets DAVIS2016,DAVIS2017 and You Tube-VOS18,the comprehensive performance of the proposed algorithm reached 90.5%,82.7%and 80.1%,respectively,and the segmentation speed reached 10FPS on the DAVIS2016 validation dataset.
Keywords/Search Tags:video object segmentation, history frames, mask refinement, memory network
PDF Full Text Request
Related items