Font Size: a A A

Generalized Zero-Shot Learning Based On Visual And Semantic Relation

Posted on:2024-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:A Y HanFull Text:PDF
GTID:2568307106484194Subject:Electronic information
Abstract/Summary:PDF Full Text Request
In recent years,deep learning models have achieved remarkable success in various application scenarios.However,these successes are heavily dependent on a large amount of labeled data.Without a sufficient amount of data,the performance of deep models can be unsatisfactory.In reality,collecting a massive amount of labeled data is very challenging,and in some cases,it is almost impossible to collect samples of certain categories,such as endangered bird species and medical rare diseases,which only exist as textual descriptions.Moreover,the speed of adding new categories to the world is too fast,which requires adding new samples and retraining the model,resulting in a significant cost.Zero-shot learning provides an excellent solution to this problem,enabling the effective recognition of unseen categories without any labeled data.This method can naturally guide human thinking on how to classify in the absence of data,giving deep models deductive reasoning abilities similar to humans,providing a good direction for achieving true artificial intelligence.It transfers the knowledge learned on the seen classes to the unseen classes without training data through the attribute information of the classes.The traditional zero-shot classification aims to recognize and classify the unseen classes in the testing phase,but the generalized zero-shot classification is more realistic than the traditional zero-shot classification,and it needs to recognize both the seen classes and the unseen classes in the testing phase.The main research contents of this paper are as follows:(1)Existing methods only focus on semantic consistency during the training phase and ignore semantic consistency constraints during the classification and feature synthesis phases.This lack of constraints may lead to synthesized visual features being unable to express their semantics accurately,and visual and semantic features may not be well-aligned on the modalities,resulting in deviations between the two types of features.To address this issue,this study improves the SDGZSL method and proposes a generalized zero-shot classification method that enhances semantic consistency and discriminative feature transformation to strengthen semantic consistency at all stages of the model.Specifically,first,semantic consistency features are extracted from the features using a disentanglement method.Secondly,a semantic decoder structure is used to decode visual features and reconstruct them into the semantic space to improve the semantic consistency of the synthesis stage.This method can strengthen the alignment between semantics and vision,reducing semantic deviation.This structure is trained through cycle consistency loss,and the output of this structure can also be considered as a semantic consistency feature,providing another semantic consistency feature.Finally,the classification module of this method is improved,and two semantic consistency features are concatenated and transformed into an enhanced semantic consistency feature,which is used to train the Softmax classifier for zero-shot classification,thereby enhancing semantic consistency during the classification phase.The experimental results on four datasets show that the proposed method can achieve better performance than the baseline SDGZSL model.Furthermore,compared with other methods of the same type,the proposed method can also achieve competitive results.(2)Most existing generalized zero-shot learning methods do not correlate the original features with predefined attributes in semantics in all dimensions during the learning process,and visual features are biased on semantics in dimensions,leading to negative transfer on unseen classes.In addition,in the classification process,rich semantic information is often ignored,and semantic information contains attributes that are unrelated to vision,which also affects the classification results.To address these issues,this study proposes a generalized zeroshot classification method with visual-semantic dual disentanglement.Specifically,two disentanglement structures are designed to extract semantic information and reduce bias,respectively,and the semantic information is fused to obtain a more comprehensive representation.The proposed method can better align visual and semantic features,reduce bias,and improve the recognition performance of unseen categories.The experimental results on three datasets demonstrate that the proposed method can outperform the state-of-the-art methods.
Keywords/Search Tags:Image Classification, Generalized Zero-shot Learning, Disentangled Representation Learning, Features Enhancement, Generative Model
PDF Full Text Request
Related items