Font Size: a A A

Research On Zero-shot Semantic Segmentation Based On Visual Semantic Alignment And Episode Training Strategy

Posted on:2024-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:B XiongFull Text:PDF
GTID:2568307112976759Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Existing supervised semantic segmentation models based on deep learning rely heavily on a large number of labeled training samples,but the time and labor cost of pixel-level image labeling are extremely high.Moreover,it is difficult to obtain samples of some classes in real life,such as some rare animals,which are difficult to meet the training conditions of the deep model.In addition,the supervised semantic segmentation model can only predict the classes existing in the training set,but does not have the ability to predict the classes absent in the training set.In view of the above deficiencies,this paper researches the extremely difficult zero-shot semantic segmentation task,whose goal is to train only samples of seen classes and predict segmentation masks for samples of unseen classes in the inference stage.Since there are only training samples of seen classes in zero-shot semantic segmentation,it is easy for inference stage models to incorrectly predict unseen classes as seen classes.Most methods use generation models to generate a certain number of samples for unseen classes,and transform zero-shot semantic segmentation into a traditional supervised semantic segmentation problem.However,when new unseen classes appear,the classifiers of such methods need to be retrained,and the quality of synthesized samples is difficult to be guaranteed.In order to solve the above difficulties and deficiencies,this paper has done the following work,including three aspects:(1)This paper introduces the training strategy based on episode into the zero-shot semantic segmentation task for the first time,simulating the training of each episode as a zero-shot semantic segmentation sub-task.Each episode training divides the training set into two disjoint support sets and revising sets.The support set is used to train the basic visual semantic alignment network(VSAN),which enables the model to have preliminary zero-shot semantic segmentation capability.The revising set further updates the model parameters.After the training of several episodes,the model gradually learns to accumulate experience in predicting the unseen classes of simulation,and finally the model can be well extended to the real unseen classes in the prediction test set.(2)In order to adapt to the above episode-based training strategy,a visual semantic alignment network(VSAN)is proposed as the basic model.The relationship between visual features and semantic prototypes is constrained by the distance measurement in the visual and semantic public space,so that the pixel-level visual features are closer to the semantic prototypes of corresponding classes.Finally,the zero-shot semantic segmentation is carried out by the nearest neighbor search in the public space,which solves the hub problem of zero-shot learning well.(3)This paper gives a detailed and systematic introduction to the relevant concepts and types of zero-shot learning.At the same time,the research work of this paper belongs to inductive zero-shot learning,and there is even no image data with unseen classes and unlabeled in the training stage,which is more challenging than transductive zero-shot learning.Through detailed experimental verification and analysis on Pascal VOC and Pascal Context data sets,the results show that the method proposed in this paper is superior to the existing classical zero-shot semantic segmentation algorithm,and can better solve the difficulties in the zero-shot semantic segmentation task.
Keywords/Search Tags:Zero-Shot Semantic Segmentation, Zero-Shot Learning, Visual Semantic Alignment, Training Mode
PDF Full Text Request
Related items