| Since there is no intersection between the categories of the training set and the test set for zero-shot image classification,traditional classification methods are not applicable in this scenario.As a public bridge between image features and category labels,semantic auxiliary information such as attributes can realize the function of transferring knowledge from seen classes to unseen classes.Therefore,the quality of image features and semantic auxiliary information is crucial to achieve the correct classification of test samples.At present,the main reasons affecting the effect of zero-shot image classification are low feature discrimination,inadequate attribute description and domain shift.To solve the above problems,two zero-shot image classification models based on self-supervised learning are proposed in this thesis.The main research contents are as follows:(1)To solve the problems of low feature discrimination and domain shift,a transductive zero-shot image classification model based on self-supervised enhancement feature is proposed.Firstly,a pretext task of jigsaw puzzles is constructed to obtain the pseudo-labels.The pseudo-labels are used as supervised information to train the convolutional neural network.The output of the second fully connected layer of the network is used as the self-supervised feature of the image.Then,the self-supervised feature is fused with the original feature,and the semantic autoencoder is used to learn the visual-semantic mapping model.The fused features are embedded in the semantic space to predict the initial labels of the test samples.Finally,the predictive labels are taken as the real labels of the test samples,and the visual-semantic mapping model is iteratively optimized by fusing features and labels to achieve more accurate classification.(2)To solve the problems of inadequate attribute description and domain shift,a transductive zero-shot image classification model based on self-supervised augmented attribute is proposed.Firstly,an image topic task is constructed to learn the topic probability distribution of image-related text in the Wikipedia image-text set.The topic probability is used as a pseudo-label to train a convolutional neural network,and the trained model is used to generate self-supervised semantic features of images in zero-shot dataset.Secondly,the self-supervised semantic features of unseen classes samples are used as auxiliary information,and unseen classes data is added to the training set.A visual-semantic mapping model is then trained with a semantic autoencoder,and the attributes of unseen classes samples are predicted according to their semantic embedding.Then,augmented attributes are obtained by combining attributes and semantic features to update the visual-semantic mapping model.Finally,class semantic features are obtained by mean calculation and ridge regression coefficients,which are combined with attributes as class prototypes for each class,and the predicted labels of samples are obtained by the nearest neighbor method using the new semantic embedding.Zero-shot image classification experiments are carried out on animal dataset(Aw A2),bird dataset(CUB)and scene dataset(SUN).The experimental results show that the proposed model can achieve better classification effect,and the accuracy of zero-shot images classification on each dataset is improved.This thesis has 27 figures,5 tables and 86 references. |