| With the increasing popularity of intelligent electronic communication devices,more and more multimedia data emerge from social platforms.Although these multimedia data may express the same subject in content,they have a huge semantic gap due to different forms of expression.How to effectively reduce the semantic gap between different modals and mine the joint semantics of different modals is the key problem of cross-modal retrieval.Hash method,which projects high-dimensional data into low-dimensional Hamming space,is widely used in cross-modal retrieval because of its high retrieval efficiency and small storage space.With deep learning showing strong representational learning ability,more and more scholars combine deep learning into cross-modal retrieval.At present,deep cross-modal hashing mainly focuses on supervised learning with labels,and the research on unsupervised learning is still limited.However,most multimodal data are in unmarked state in real life,which is difficult to be used by various supervised algorithms Therefore,the paper mainly focuses on the research of deep unsupervised hashing algorithm for cross-modal image and text retrieval.The main work is as follows:(1)Aiming at the problem that the existing unsupervised hashing algorithms have limited semantic information of manual feature or deep feature and relax the hash code discretization constraint to be continuous,resulting in a large number of quantization errors,this thesis proposes Deep Unsupervised Discrete Cross-modal Hashing(DUDCH)based on knowledge distillation.The model adopts the idea of knowledge transfer in knowledge distillation,extracts the correlation information of pairs of samples from the existing unsupervised cross-modal teacher model,and reconstructs the symmetric similarity matrix to help the supervised cross-modal student model training.In addition,in order to reduce the quantization loss of hash code,the discrete cyclic coordinate descent method is used to update hash code iteratively by bit.Finally,in order to reduce the training time of the combined model,lightweight end-to-end teacher network and asymmetric student network are used for training.(2)In view of the insufficient retrieval performance of deep unsupervised cross-modal hashing algorithm without label guidance and the low confidence after introducing pseudo labels through clustering algorithm,this thesis proposes a Deep Noise Mitigation and Semantic Reconstruction Hashing(DNMSRH)for unsupervised cross-modal retrieval.The model generates pseudo labels by k-means clustering method to supervise model training,but the discrimination ability of clustering method is limited and the confidence of pseudo labels is insufficient,leading to the introduction of noise and misjudgment of the model.Therefore,the equivalent training data satisfying minimum variance within clusters and maximum variance between clusters in each cluster is found to reduce the data noise caused by misjudgment of outliers.At the same time,offline hard labels and online soft labels are introduced to reduce the label noise caused by pseudo labels.Hard labels and soft labels are generated by clustering and collaborative training of heterogeneous image and text networks respectively.In addition,the symmetric multi-metric similarity matrix is reconstructed and the semantic information of heterogeneous modals is combined,which not only maintains the manifold distribution of the original features,but also expands the joint semantic information of heterogeneous features. |