The Xixia Dynasty was founded by the Dangxiang minority in the northwest of Chinese Mainland.Its unique national characters are called Xixia characters.According to the argument of Professor Li Fanwen,a scholar of the Western Xia Dynasty,there were a total of 5917 official characters used in the Western Xia Dynasty,but only 5857 characters had actual meanings.The imperial edicts issued by the Western Xia Dynasty,from temples to Buddhist scriptures used for spreading beliefs,were all written in Western Xia script.For scholars of the Western Xia Dynasty,the ancient books of the Western Xia Dynasty are an important carrier for understanding the history and culture of the Western Xia Dynasty,so the writing of the ancient books of the Western Xia Dynasty has high research value.When scholars of the Western Xia Dynasty interpret the ancient texts of the Western Xia Dynasty according to the current method,they can only combine Xia Chinese dictionaries and Western Xia language materials,but the efficiency is low and prone to errors,so there is an urgent need for an efficient text retrieval method.Due to the high similarity,noise,and long tail issues in the Western Xia ancient texts,this article conducted the following research to address the problems in this dataset and the shortcomings of traditional image retrieval algorithms:(1)There are long tail issues and image noise issues in the dataset of Western Xia ancient literature.Firstly,this article adopts the variational automatic encoders(VAE)data augmentation method to expand samples and solve the problem of long tail data;Then,a comparative experiment was conducted using traditional denoising methods and convolutional denoising autoencoder algorithms.The experimental results show that the image denoising algorithm based on convolutional denoising autoencoder has better performance.(2)In response to the problem of low retrieval accuracy in traditional image retrieval algorithms,this paper proposes a DC-CBAM-ResNet50 retrieval algorithm.Firstly,using the ResNet50 network model as the backbone network,and replacing traditional convolutions with hollow convolutions in this model,the network model can improve the resolution of the low pixel rate problem of Xixia ancient text images to a certain extent;Secondly,adding comparative deep supervision to the network can effectively enhance the network model’s ability to solve the problem of high similarity in Western Xia ancient texts;Next,the fusion attention mechanism is added to the network to enhance the extraction of advanced semantic features.Then,the DC-CBAM-ResNet50 model is transfer learning in the CTW-1500 Chinese dataset.Finally,the image feature vectors are subjected to Hamming distance similarity calculation to return the top 10 text images and their text information,thus completing the retrieval of text images.The experimental results show that the DC-CBAM-ResNet50 network model has good retrieval accuracy.(3)Xixia Ancient Book Text and Image Retrieval System:This article designs and implements this system.The system includes three key parts,namely the client,server,and database.The client has registration,login,and retrieval functions;Provide retrieval services on the server using the DC-CBAM-ResNet50 algorithm proposed in this article;The database includes a hash image index database,a text information database,and an image database.Users perform text and image retrieval of Western Xia ancient books on the client. |