| Image similarity measurement aims to evaluate the similarity in content between given images by extracting and describing the features of the images and it is an important part of the field of computer vision and pattern recognition.This research group has been undertaking the task of similarity detection of project applications of the National Natural Science Foundation of China and has developed a project text detection system.However,there is still a lot of image plagiarism in the applications.To realize a full coverage detection of applications,the similarity detection of massive images is imminent.The current mainstream structure for image metric is a supervised learning algorithm based on contrast constraints,which requires a large amount of labeled data and consumes a lot of costs.In addition,these algorithms only focus on the image without effectively utilizing other related modal information when measuring similarity.Therefore,how to train unsupervised visual representation models and further utilizing multimodal information is a hot research trend.The thesis focuses on the problem of extreme dependence on labeled data,the inability to effectively use the text information,and the ambiguity of text.The main research and applications are as follows:(1)Aiming at the problem of extreme dependence on labeled data,an unsupervised image similarity measurement algorithm based on contrastive learning is proposed.This thesis introduces contrastive learning into the image similarity measurement,which solves the issue that the previous methods relied heavily on labeled images.Moreover,unlike general contrastive learning models that require fine-tuning of downstream tasks to improve the performance further,the proposed model can map the input data into a metric space that can directly use distance to represent the similarity without downstream tasks.According to theoretical analysis and experimental results,it is shown that the proposed model can effectively utilize unlabeled data and has the same accuracy as supervised learning models.(2)Aiming at the problem of how to use text information and the ambiguity of the text,a multimodal image-text fusion structure based on the multi-mapping text processing module is constructed.The thesis builds a multimodal fusion structure of image and text to address the issue of lacking effective use of text information such as image titles related to image content.In addition,considering the problem of text ambiguity caused by the introduction of text information,the model adopts a multi-mapping module when processing text information.By mapping a text input to multiple points in the semantic space,the module mines a variety of semantic information implicit in the text and realizes the polysemy compatibility of the text.Through theoretical analysis and experimental results,the proposed model can effectively utilize text information in image similarity measurement and achieve better results.(3)The above results are implemented and applied to the image similarity detection of fund applications.Based on the proposed unsupervised image similarity measurement algorithm based on text fusion,an image similarity detection system for fund project applications is designed and implemented.The system can more efficiently detect plagiarism of scientific research results.After testing,the system has stable performance and can quickly and accurately detect images with similar content in fund project applications,achieving the desired effect. |