Font Size: a A A

Research And Implementation Of Natural Scene Text Detection Method Based On Deep Learning

Posted on:2024-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:S Y LeiFull Text:PDF
GTID:2568306944462574Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Scene text detection is an important branch in the field of computer vision.The characteristics of diverse forms and acquisition methods of scene text limit the effectiveness of traditional models that rely on manually designed strategies for text feature learning,while deep learning provides a simple and effective solution for scene text detection.In this paper,we use deep learning related technology to optimize the scene text detection model from three aspects of improving detection performance,improving inference speed and expanding application scenarios,and build a visualization system to intuitively demonstrate the results.The convolutional neural networks cannot adapt flexibly to scene text with large variations of shape and size.To solve this problem,we consider feature extraction as modeling global contextual information of the sequences,and fuse features of different scales through a multi-stage Transformer,which can provide a new perspective for the feature extraction component of scene text detection methods.For our two-stage text detector,this paper also proposes an adaptive text anchor matching module,which assigns positive and negative samples based on the probability distribution of text anchor scores.The aim is to improve the quality of positive text anchors and further enhance the accuracy of generated text coordinates.The effectiveness of above model components and the advancements of detection performance are verified on 7 public datasets.Aiming at the problems of the deep learning-based text detection model with too many parameters and too much computation cost,we propose a knowledge distillation-based compression method for scene text detectors,which compresses the scene text detection model to reduce the computation and storage costs and achieve a balance between the number of parameters,speed and accuracy.Specifically,we take the highperformance large-scale detector as the teacher network and design a lightweight detector as the student network,and then distill the two modules of intermediate features and text proposals classification to transfer the dark knowledge contained in the teacher to the student,so that the student could be trained better under the guidance of the teacher.The performance of the lightweight text detection model proposed in this paper on two benchmarks can be competitive with teacher,or even a little better.In order to expand the application scenarios of scene text detection model,this paper constructs a video scene text tracker based on the tracking-by-detection method.Specifically,for a given video sequence,we use the scene text detector to detect text in each frame.Then,multi-frame text tracking is completed through data association between the detection results of current frame and previous frame.The text detection model can be flexibly changed based on scenarios to match different requirements.This paper also designs and implements a visualization system,which deploys the above scene text detection model and video scene text tracking model into a Web application framework built based on Django,and.intuitively displays the text detection and tracking results.
Keywords/Search Tags:Scene Text Detection, Video Scene Text Tracking, Transformer, Anchor Assignment, Knowledge Distillation
PDF Full Text Request
Related items