Study On Video Text Detection Based On Temporal Information

Posted on:2019-12-25

Degree:Master

Type:Thesis

Country:China

Candidate:Y Yang

Full Text:PDF

GTID:2428330563991550

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid popularization of mobile intelligent devices,the large-scale rise of social networks and video websites,video has become an important information carrier.Consequently,the demand for intelligent analysis and intelligent processing of video is also increasingly urgent.As a high-level feature,text in video directly contains the semantic information,often accurately expresses the key information about the video.Therefore text is an important way to interpret the video content,and also an important basis for video retrieval as well as video content understanding.As a result,video text detection technology not only has the huge research value for the video intelligent analysis and processing,but also has broad application prospects in intelligent driving,geographical positioning,network security and so on.Traditional text detection algorithms are aiming at a single image.Because of uneven illumination,low resolution,complex background,multi-orientation text,the large number of frames in video,the direct application of the existing single image text detection algorithms to video often has poor accuracy and slow speed.The prominent characteristic of video is that it contains redundant information in time domain.In this paper,by mining the redundant information in time domain to meet the above challenges,we propose an effective video text detection scheme which improves both detection accuracy and speed.Our main work and contributions are:1.A fully convolutional neural network for video frame text detection task is designed.This model can extract rich features through multi-layer neural network.It can not only detect the horizontal text,but also the multi-orientation text.The experiments prove the validity and generality of our detection model.2.Detecting the video frame by frame with the text detection model proposed above has low computational efficiency due to the large number of video frames.Based on the fact that the content in adjacent video frames change small,we use optical flow information to accelerate video text detection.We only need to extract feature maps directly from the key frames through feature extraction network,and then use the optical flow information to propagate the feature maps of key frames to adjacent video frames.Detection speed is accelerated while the detection accuracy is maintained as the featureextraction time is greatly saved.3.In the case of complex text background,uneven illumination,video blur and so on,the text detection model of a single video frame inevitably has some limitations,such as missing detection and false detection.In order to solve this problem,we make use of the complementary information in adjacent frames to further mine the temporal information,and fusion the detection results of single frame,so as to correct the false detection and missing detection and improve the detection accuracy.In this paper,the proposed algorithm is experimentally validated on two public video text datasets : the Minetto and ICDAR2015.The experiments show that the proposed detection scheme has achieved good results in terms of detection speed and detection accuracy.

Keywords/Search Tags:

Video text detection, Fully convolutional network, Optical flow, Temporal Information, Multi-frame fusion

PDF Full Text Request

Related items

1	Research On Application Of Convolutional Neural Network In Video Super Resolution
2	Research On Video Behavio Classification Model Based On Information Depth Representation And Fusion
3	The Study Of Recognition Of Fire Smoke Video Images Based On Optical Flow And Multi-information Fusion Detection Algorithm
4	The Research Based On Convolutional Neural Network For Text Detection In Natural Scene Images
5	Research On Algorithms Of Video Object Segmentation Based On Dual Stream Network In Complex Scenes
6	Human-vehicle Abnormal Detection Method In Surveillance Video Based On Temporal CNN And Sparse Optical Flow
7	Research On Deep Learning For Sound Event Detection
8	Natural Scene Text Detection Based On Fully Convolutional Network
9	Research On Multi-scene Image Text Detection Method Based On Fully Convolutional Network
10	Research On Video Frame Interpolation Algorithm Based On Global Optimal Optical Flow