| With the deepening of Chinese cultural research and the development of digital cultural relic collection technology,the quantity of cultural resource data and cultural digital content has increased accordingly.How to store,manage,and retrieve cultural data has become an important task.This thesis takes cultural relic image data as the research object,constructs subgraph and sketch datasets,proposes the FMHPPA subgraph retrieval model that combines folding multi-hole pyramid pooling and attention mechanism,and the MFSR sketch retrieval model based on multimodal fusion.A sketch retrieval and subgraph retrieval system based on multimodal fusion is built,providing functions such as sketch retrieval and subgraph retrieval.The main research work includes the following aspects:(1)Constructing cultural relic image subgraph and sketch datasets.In order to meet the needs of algorithm training and testing,this thesis manually clips parts of cultural relic images as subgraphs and constructs a subgraph dataset combined with the original image.A total of 1986 images were collected as the training set and validation set,and 652 images were collected as the test set.In addition,this thesis uses a sketch generation algorithm to generate corresponding sketches of cultural relic images,and collects a total of 1592 triplets(sketch,text,image)as the training set of the sketch dataset,combined with the corresponding text and image.At the same time,this thesis collects 613 triplets(sketch,text,image)as the test set of the sketch dataset through manual drawing of sketches of cultural relic images.(2)Proposing the FMHPPA subgraph retrieval model that combines folding multi-hole pyramid pooling and attention mechanism.In order to solve the problem of scale changes in subgraph retrieval,this thesis proposes to use optimized folding multi-hole pyramid pooling in the feature extraction module of the image to extract multi-scale information of the image.In order to avoid the impact of dense local features and irrelevant features on retrieval performance and accuracy,this thesis uses the attention mechanism to select key features of local features.This thesis conducted ablation experiments and comparative experiments on the constructed subgraph dataset and achieved better results.(3)Proposed a multi-modal fusion MFSR sketch retrieval model.In order to improve the accuracy of sketch retrieval,this thesis uses text to supplement the missing color and texture information in sketches.In order to build a multi-modal fusion space for text,sketches,and images,this thesis uses the powerful transfer and generalization capabilities of the graph-text multi-modal model CLIP,adds a sketch branch based on it,and combines text features and sketch features through weighted fusion.The multi-modal fusion space is trained by adopting contrastive learning on the fused features and image features.Finally,this thesis conducts comparative experiments on the constructed sketch dataset and achieves better results.(4)Designed and implemented a sketch retrieval and subgraph retrieval system based on multi-modal fusion.In order to effectively manage cultural resources and provide data foundation for sketch retrieval,subgraph retrieval,and other related research,this thesis has implemented a multi-modal fusion-based sketch retrieval and subgraph retrieval system for storing,managing,browsing,and retrieving cultural resources.Sketch retrieval and subgraph retrieval functions are also implemented in the system.In summary,this thesis proposes the FMHPPA subgraph retrieval model and MFSR sketch retrieval model,and achieves good results on the constructed dataset,as well as implements a sketch retrieval and subgraph retrieval system based on multi-modal fusion. |