Font Size: a A A

Application Research On Video Object Segmentation With 3D CNN And Attention Mechanism

Posted on:2023-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:L F XiaFull Text:PDF
GTID:2558306845991239Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Video object segmentation(VOS)is a binary labeling problem aiming to automati-cally segment foreground objects from the background region of a video.In recent years,with the rapid development of the Internet,the amount of video data has surged.Manual processing of these data is time-consuming and labor-intensive,thus it becomes main-stream to process them automatically with deep learning technology.Since video data contains rich temporal and spatial features,fully and effectively utilizing them will help to deal with the problems and challenges in VOS task.Aiming at the shortcomings of traditional methods in extracting the appearance fea-tures and motion features of video objects simultaneously,this paper studies a semi-supervised VOS algorithm based on 3D CNN.For the first frame ground truth mask is difficult to obtain in the semi-supervised VOS task and it is difficult to handle some complex scenes in unsupervised VOS task,starting from the segmentation guidance in-formation in the form of gaussian saliency map,this paper studies a weakly supervised VOS algorithm based on saliency map.Details are as follows:(1)A semi-supervised video object segmentation algorithm based on 3D CNN is proposed.The existing semi-supervised VOS models design two different networks to process the appearance and motion features of the object respectively,and there is a loss in the fusion process of the two features.Therefore,the structures of these models are often complex and they are usually difficult to train,leading to an insufficient accuracy.In this paper,we propose a semi-supervised segmentation model based on 3D CNN,which extracts appearance and motion features through a 3D CNN and learns spatio-temporal information simultaneously,thus simplifying the network structure.We also utilize meth-ods such as separated convolutional network to reduce the number of parameters,which improves the training efficiency.Experimental results on the DAVIS dataset show that the_J&_Findex of the segmentation results of model reaches 83.9%,which is better than similar mainstream models.(2)A weakly supervised video object segmentation algorithm based on saliency map is proposed.Aiming at the problems that it is difficult to obtain the first frame ground truth mask in semi-supervised VOS methods and difficult to handle some complex scenes in unsupervised VOS methods due to the lack of segmentation guidance information,we propose a weakly supervised VOS algorithm based on saliency map.Since the saliency map of the object is easier to obtain than the ground truth mask,our model is more practical.The algorithm designs a foreground attention module based on the saliency map information,so that it pays more attention to the foreground target.Combined with the channel attention and spatial attention mechanism,the model can deal with more complex video scenes.Experimental results conducted on the DAVIS dataset verified the effectiveness of the algorithm,and the_J&_Findex of the segmentation results reaches81.3%,which exceeded the similar mainstream models.This thesis contains 43 figures,11 tables and 79 references.
Keywords/Search Tags:Video Object Segmentation, Spatio-temporal features, 3D CNN, Saliency map, Attention Mechanism
PDF Full Text Request
Related items