Arabic And English Scene Text Detection Using Convolutional Neural Networks

Posted on:2021-04-27

Degree:Master

Type:Thesis

Institution:University

Candidate:Adil Nawaz

Full Text:PDF

GTID:2415330611966323

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the success of artificial intelligence and deep learning in classifying images,the research community has focused its interest on harnessing the powers of deep learning to facilitate various tasks that were considered challenging and impossible to achieve in the past.One such area where the deep learning has achieved much success is the detection of text in natural images.With the availability of more data and better computing resources,much progress has been made in applying deep learning for scene text detection and recognition with several state-of-the-art results that sometimes even surpass the human abilities.In this realm progress is being made in detecting text in more challenging scenarios for its wide range commercial applications.However,despite the huge success in the detecting challenging texts in natural images,the focus of most of the methods and datasets have been to attack only one aspect of the properties appearing in the scene images.Most of the datasets that are available for scene text detection incorporate a single language text that is mainly a Latin script.Limited progress is made in creating and curating datasets that can be used for multi lingual training of the deep learning models.Due to limited availability of the diverse language datasets,research is particularly focused on tackling the challenges that often appear in the images in the existing datasets.This dissertation attempts to diversify the realm of text detection to include images that contain texts from different languages and create a deep learning model for localization of these instances.Firstly,a dataset of 1200 images is proposed that comprises of scene text instances mainly in four languages i.e.English,Arabic,Persian and Urdu.This dataset also contains a small subset of images with text instances in Hebrew and Pashtu languages.To incorporate the diversity,imitate the natural conditions and induce an element of challenge these images are collected from a wide area with huge variations such as horizontal text,vertical,long and focused,short and obscure,curved and irregular text instances.An end-to-end convolutional neural network is designed comprising of three parts extracting the features using Res Net-50 which are then progressively merged to improve the ability of network to recall all text instances a prediction head then classifies the images into text/non-text areas,head/tail of the text instance and offsets for the bounding box using the head and tail parts.The experiments on the dataset show the efficiency of a simple and fast model on multilingual text detection with precision,recall and h-mean of 0.90,0.65 and 0.76 respectively on ICDAR MLT dataset.

Keywords/Search Tags:

Convolutional Neural Networks(CNN), Deep Learning, Scene Text Detection, VGG, ResNet

PDF Full Text Request

Related items

1	Research On Scene Mongolian Character Detection And Recognition Based On Deep Learning
2	Extraction And Generation Of Sketches Of Painted Cultural Relics Based On Deep Learning
3	Research On Multi-shape Scene Tibetan Text Detection Technology
4	Research On Image Recommendation For Lyrics Based On Deep Learning
5	Chinese Painting Landscape Style Transfer Based On Deep Convolutional Neural Network
6	Chronological Classification And Damage Detection Of Dunhuang Murals Based On Deep Learning
7	Research And Implementation Of Cultural Relics’ Recognition System Based On Convolutional Neural Network
8	Research And Design Of Uyghur Text Detection And Recognition Based On Deep Learning
9	Musical Instrument Identification Based On Deep Learning And Timbre Analysis
10	Research On Content And Style Recognition Of Calligraphy Characters Based On Deep Learning