Font Size: a A A

Keywords Retrieval Of Uyghur Document Image Based On Hierarchical Matching

Posted on:2020-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:J J LiFull Text:PDF
GTID:2568305882498294Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid and widespread development of computer vision and multimedia technology,various kinds of digital image information are increasing day by day.More and more paper documents are loaded with document images in the form of special data of text information.Compared with paper documents,document images are easier to store,manage and transmit,and are not easier to be tampered and forged.It has become an indispensable way of information storage in daily life.Therefore,how to manage and retrieve document images effectively and accurately has become the current research focus.For plain text document images,this paper proposes a keyword document image retrieval framework based on coarse-to-fine level matching,and applies it to Uyghur document image retrieval.The main work of this paper is as follows:(1)The Uyghur document image database with plain text layout was established,it contains 2414 Uyghur document images with a size of 716 pixels *1011 pixels in 8depth and *.BMP format.(2)Document image was preprocessed.For the collected original document images,the weighted average method was used for grayscale,the maximum inter-class variance method was used for binarization,the bilateral filtering method was used for denoising,and the Hough transform was used for slant correction for preprocessing.For word images,Zhang refinement algorithm was used to extract the skeleton information.(3)Document images were segmented to word images.It was proposed a method of combining morphological expansion with integral projection to realize word segmentation of Uyghur document images,and filters the irrelevant image units besides word images according to the threshold of segmentation unit in this paper.(4)The method of document image retrieval based on keywords was improved.This paper proposes a retrieval framework from coarse to fine level matching to realize image retrieval of Uyghur documents based on keywords.In rough matching,the template matching method based on distance feature was adopted.In accurate retrieval,this paper fuses word image Histogram of Oriented Gradient(HOG)feature and word framework Histogram of Oriented Gradient(T-HOG)feature.Support Vector Machine(SVM)classifier was used to train feature data to achieve keyword accuracy retrieval.In this paper,10 commonly used keywords were used to conduct retrieval experiments in 108 randomly selected document images,it was obtained 91.14% of average accuracy,and 79.31% of recall rate.The results show that this method can effectively realize keyword-based Uyghur document image retrieval.
Keywords/Search Tags:Uyghur, keywords, document image retrieval, template matching, SVM
PDF Full Text Request
Related items