| Ancient texts,typically referring to the classical works,books,and documents of ancient China,are a crucial part of China’s traditional culture.They record thousands of years of Chinese history,cultural heritage,philosophical thoughts,and scientific achievements,holding immense value for future generations.For a long time,scholars have focused on the collection,preservation,study,and transmission of ancient texts.With the advent of the digital age,a plethora of emerging technologies have surfaced,and digitization of ancient texts,converting them from physical copies into computer-readable data,has become one of the major research efforts in China.This approach enables permanent preservation and convenient sharing of ancient texts,facilitating their study and dissemination.Text detection and localization are critical steps in the digitization process,as their effectiveness directly impacts the quality of subsequent recognition and semantic extraction tasks.Currently,research on ancient text detection in China is limited,with most existing methods relying on traditional image processing techniques that depend on manually designed features and exhibit low accuracy.Few deep learning-based detection algorithms take into account the characteristics of text in ancient images,have low generalization,and are prone to missed detections when facing complex backgrounds and densely arranged small texts,limiting their practical applicability.In this paper,leveraging deep learning-based object detection algorithms and data features of ancient images,we carry out the following research on two datasets in pursuit of a highly accurate and generalized ancient text detection model.(1)Collaborating with relevant research institutes,we constructed an AHBD handwritten ancient text dataset comprising 700 images with four languages: ancient Yi,ancient Chinese,ancient Tibetan,and Western Xia scripts,which,combined with the publicly available MTHv2 ancient text dataset,formed the basis of our research.To reduce the false negative rate and improve detection accuracy,we proposed a character detection model based on feature enhancement,with the ATSS algorithm as the baseline.We introduced deformable convolution layers and spatial attention mechanisms to form a new Building Block,embedded at appropriate positions to strengthen feature extraction of text targets.Moreover,we incorporated a lightweight,learnable upsampling operator CARAFE in the feature fusion network to enlarge feature map dimensions,reducing information loss of ancient text targets during upsampling.Finally,we proposed an Enhanced Module based on GCNet,added to the feature pyramid for more balanced information fusion.Experimental results demonstrated significant improvements in detection metrics on both ancient text datasets.(2)We proposed a detection head structure based on feature decoupling and Io U-Aware to address the conflict and misalignment between classification and regression tasks,facilitating feature extraction according to their unique task characteristics,and alleviating the connection loss during the optimization process.First,we introduced a channel and spatial information decoupling module,employing an M-ECANet channel weighting module to achieve channellevel feature decoupling,followed by a deformable convolution layer for spatial feature decoupling,catering to the different feature requirements of the two tasks.Next,we added an IoU-Aware branch parallel to the regression prediction head,which associated classification scores with regression accuracy while introducing minimal computational overhead,mitigating misalignment between the two.Experiments on the AHBD and MTHv2 datasets validated the effectiveness of the proposed network structure and methods.(3)We combined the two algorithmic components to form the final ancient text character detection model,based on which we designed and implemented an ancient text image character detection system.The backend utilized the Django framework,and the frontend employed HTML,CSS,and JavaScript.The system allows users to upload handwritten ancient text images,automatically outputs images with text detection bounding boxes,saves documents containing coordinate information,and generates cropped single-character images based on text boxes.This system’s implementation aids in enhancing the performance of ancient text OCR. |