| In the field of text detection,it is characterized by large variations in text size,aspect ratio,and text can be in any orientation.Due to the scalability of pixel-level prediction,segmentation-based methods can adapt to various shapes of text and become the mainstream of text detection.However,there are still two problems in text detection.First,images in natural scenes usually have complex backgrounds,which cause great interference to text detection and are prone to missed detection when detecting small-scale text.The second problem is that the text in natural scenes is diverse,and its forms include texts with large scale changes such as horizontal,inclined,straight lines,and curves.Therefore,when detecting such multi-scale texts,the detection will be incomplete.For question one,this paper proposes a text detection model based on hybrid attention and feature enhancement(Hybrid Attention Fusion and Feature Enhancement Network,HAF-FEN).By analyzing the advantages and disadvantages of existing methods,a hybrid attention fusion module and an adaptive feature enhancement module are proposed to reduce the interference of background noise on text and improve the detection ability of small-scale text.Mixed attention combines local details and global text information to reduce the interference of background noise on detection and improve the attention to text.The adaptive feature enhancement module performs self-adaptive learning,grasps the importance of feature information of different spatial locations,and dynamically aggregates the features,thereby improving the model’s ability to detect small-scale text.In addition,a combination of multiple loss functions is used on the loss function to solve the problem of imbalance between positive and negative samples in training.For question two,this paper designs a multi-scale text detection network based on pyramid feature enhancement(Pyramid Feature Enhancement for Multi-Scale Network,PFE-MSN).First,deformable convolution is used in the backbone network to expand the receptive field range,and a feature pyramid structure with multiple combinations is proposed.This structure integrates the overall and local semantic information of different features,and improves the network’s attention to multi-scale feature context information.In addition,a multi-scale channel feature fusion module is proposed,which uses global and local fusion ideas in the channel dimension for adaptive learning and understanding,and assigns different weight ratios to text regions,which further improves the robustness of the network to multi-scale text.In this paper,relevant experiments are carried out on the above models by using public data sets.The data and visual inspection diagrams show that the HAF-FEN and PFE-MSN network models proposed in this paper have achieved relatively good results on both data sets,which confirms the results of this paper.The effectiveness of the two methods is mentioned. |