| As computer vision continues to advance,image classification techniques are finding broader and deeper applications.Traditional image classification models often rely on large-scale and strongly labeled training data.However,with the emergence of more and more new categories,collecting a large number of labeled sample data in real life is not only very difficult,but also often consumes a lot of cost.The emergence of zero-shot image classification effectively improves the shortcomings of traditional image classification techniques.It trains seen class samples and uses relevant auxiliary information(such as semantic features or word vectors)to transfer knowledge from seen classes to unseen classes,thereby predicting the category labels of unseen class samples.However,the test set of traditional zero-shot image classification only includes the unseen class,which does not reflect the complexity of real-world scenarios.Generalized zero-shot image classification poses an even greater challenge,as its test set comprises both seen and unseen classes.Therefore,this dissertation focuses on generalized zero-shot image classification as the research task.To address the problems of redundant information,cross-dataset bias,and domain shift in current generalized zero-shot image classification,this dissertation combines the decoupling representation learning method with generative models to study feature decoupling representation from both single-modality and cross-modality perspectives.Two generalized zero-shot image classification models based on decoupling representation learning are proposed,and the following research work is mainly conducted:(1)To alleviate the issues of redundant information and cross-dataset bias,this dissertation considers the decoupling representation of features from the visual modality and proposes visual feature contrast decoupling for generalized zero-shot image classification.Specifically,the visual features are first input into a conditional variational autoencoder to generate visual features of unseen classes,and then two different decoupling encoders are used to encode visual features as latent information and decouple them into semantic-related and semantic-unrelated latent representations,while total correlation penalty and contrastive loss are applied to encourage the mutual independence of the two representations,and semantic relationship matching model is used to measure its semantic consistency and thus guide the model to learn semantic-related representations.Then,the decoupled latent representations are cross-fused and fed to the decoder to reconstruct the images.In addition,feature refinement module is designed to remove the redundant information from the original features to mitigate the impact of cross-dataset bias on the classification.Finally,features refined by feature refinement module and semantic-related representations are used to jointly learn a generalized zero-shot image classifier.(2)To alleviate the issues of redundant information and domain shift,this dissertation considers the decoupling representation of features from the visual and semantic modalities and proposes cross-modal alignment decoupling for generalized zero-shot image classification.Specifically,the visual features are first input into a conditional variational autoencoder to generate visual features of unseen classes,and the visual and semantic features are subsequently decoupled into categorization-related and categorization-unrelated features by the alignment decoupling module,and the total correlation penalty is used to ensure the independence between the two representations,and their semantic consistency is measured by attribute comparator.In this process,the latent representations are aligned explicitly and implicitly by adding cross-modal cross reconstruction loss and visual-semantic distribution alignment loss,respectively.In addition,the latent representation alignment method is used here to strengthen the alignment of cross-modal latent representations guided by auxiliary classifiers,thus alleviating the domain shift problems.Finally,a generalized zero-shot image classifier is learned using the categorization-related features decoupled by the alignment decoupling module.This dissertation presents experimental results on four widely used public datasets,namely AWA2,CUB,SUN,and FLO,and compares them with more advanced generalized zero-shot image classification models in recent years.The results show that the proposed model achieves better results on all four datasets,proving the effectiveness of the proposed model.In addition,the effectiveness of the proposed decoupling method is further demonstrated through visualization analysis and zero-shot retrieval. |