Font Size: a A A

Research On Scene Text Detection Based On Multiscale Fusion

Posted on:2022-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ZhangFull Text:PDF
GTID:2518306557464854Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With the rise of artificial intelligence and the explosive growth of image data,document text detection no longer meets people’s daily needs,and more will be further research on scene text.Because of its particularity,scene text detection is affected by many factors such as illumination,background complexity,text diversity,etc.It has become one of the research hotspots in recent years.At present,mainstream detection algorithms are mainly based on deep learning,designed for the linear feature of multi-directional text borders and have good detection results.However,due to the particularity of the curve text itself,the corresponding representation method fails to achieve precise positioning of the edge contour and causes the problem of reduced accuracy.To this problem,this paper starts from the perspective of multi-scale fusion and frame refinement,and adopts a hybrid method of image segmentation and target detection,which is used to improve scene text detection.The main research contents are as follows:(1)Convolution or pooling operations will often affect the size of the feature size.If multiple pooling will cause the feature size to be too small,reducing the pooling layer will affect the receptive field and ignore the important information at the pixel level.In order to balance this relationship,a two-branch fusion network method is adopted,that is,combining the advantages of ASPP network and feature pyramid network to enhance the receptive field to obtain hierarchical information of different scales,and improve the detection effect of small targets and long texts.At the same time,this method also It can effectively alleviate the impact caused by the loss of spatial information.(2)The quadrilateral anchor frame design cannot effectively characterize the curve text,so this paper proposes a two-stage refinement method to act on the detection layer.The method includes two modules: direct regression and shape representation.The first stage of direct regression is used to determine the rough position of the text.The second stage of shape representation uses the principle of image segmentation to obtain the text area and the center line and merge to generate the corresponding connected area,and then combine multiple sampling points to reconstruct the text line and the outer border.Finally,cut the thinned border to obtain a more accurate curve text detection result.(3)In order to suppress the one-stage redundant frame and improve the precise positioning of the text position,the method of local perception NMS is used to optimize,and the parameters are optimized for the imbalance of positive and negative samples in the model in this paper and the difficulty of calculating the regression loss of overlapping anchor frames.In the loss function part,the weighted Focal loss and GIo U are introduced to act on the shape characterization and direct regression modules respectively,making the overall model training process more stable.The algorithm in this paper is tested on the ICDAR2015 and Total-Text data sets,and the recall rate,accuracy rate,and F value are used as the evaluation basis.The experimental results show that compared with other mainstream algorithms,this algorithm has improved accuracy and F-value of curve text,and it also has better detection results in multi-directional text,which further verifies that the method is feasible and has strong robustness.
Keywords/Search Tags:scene text detection, curve text, feature fusion, imbalance of positive and negative samples
PDF Full Text Request
Related items