In everyday life,textual content is widely used in natural scenes such as street signs,billboards,and product packaging.Due to the complexity of the background in natural scene images and the variations in font texture,scale,and shape,text detection in images faces many challenges and difficulties.Among numerous research works,segmentation-based scene text recognition methods enable pixel-level prediction of text,adapt well to variations in text shapes,and overcome interference from complex backgrounds,making them widely used in practice.However,on the one hand,segmentation-based methods often struggle to accurately separate adjacent text,and their post-processing algorithms exhibit high temporal and spatial complexity.On the other hand,segmentation models typically adopt the Feature Pyramid Networks(FPN)structure,which suffers from misalignment problems between adjacent layers,which directly affects the model’s recognition performance when fusing the features of adjacent layers.Therefore,this paper conducts an in-depth investigation of segmentationbased scene text detection methods and proposes improvements to address the existing issues in current methods.In addition,the proposed text detection algorithm is applied to the text detection project for truck doors.The main work and contributions of this paper can be summarized as follows:(1)To address the challenges of accurately separating adjacent text and the high complexity of the algorithm,a Text Kernel Reconstruction and Expansion(TKRE)algorithm is proposed for detecting arbitrarily shaped text in natural scenes.This algorithm first uses a fully convolutional neural network to predict an orientation field represented by a two-dimensional vector image,where the vectors point from the text boundaries to the text centers,perpendicular to the text boundaries.Then,the algorithm shrinks the text instances inward according to the direction indicated by the orientation field,forming text kernels and separating adjacent texts.Finally,a text kernel expansion algorithm based on a distance transformation assigns pixels within the text to the nearest text kernel,resulting in separated text instances.By operating on only a small fraction of the pixels within the text instances,this algorithm exhibits lower time and space complexity.Experimental results show that the proposed method achieves state-of-the-art or similar detection performance on public datasets such as Total-Text,CTW1500,ICDAR2015,and MSRA-TD500.(2)A scene text detection method based on the Deep Feature Alignment Network(DFANet)is proposed to address the misalignment problem of adjacent layer features in the feature pyramid network.Specifically,DFANet performs multiple alignments on both high-level and low-level features.The aligned high-level and low-level features are then summed accordingly,and 3D convolution is applied to fuse the summed features into a single output.To mitigate the increased memory consumption and computational complexity caused by DFANet,a Selective Feature Pyramid Network(S-FPN)is introduced to reduce the channel number of input features.This approach improves the recognition accuracy while maintaining the realtime recognition speed with minimal memory consumption.Experimental results show that the proposed method achieves comparable or superior detection performance compared to state-of-the-art methods on public datasets such as Total-Text,CTW1500,ICDAR2015,and MSRA-TD500.(3)Text Data Recognition for Trucks.Text near the doors of trucks often contains information such as the vehicle’s payload parameters and the company to which it belongs.This information is of great importance to trucking companies and traffic management authorities for effective management.However,current public datasets do not include text characters from truck doors,making them unsuitable for truck-specific text recognition.Therefore,this paper addresses the practical needs of relevant departments by constructing a dedicated dataset for text detection on trucks and applying the proposed text detection algorithm to this dataset.Experimental results show that the method presented in this paper achieves a detection accuracy of 95.43% in terms of F-score,which accurately detects text on trucks and meets real-world requirements. |