Font Size: a A A

Design And Implementation Of Tibetan Ancient Document Recognition System

Posted on:2020-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y H HanFull Text:PDF
GTID:2415330572986755Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Tibetan historical documents are treasures among the World Cultural Treasure,which record the development changes of Tibetan religion,culture,politics and economic.And Tibetan historical documents are not only valuable historical materials for studying the changes of Tibetan history and cultural,but also historical witnesses for the prosperity and development of the whole Chinese nation family.Tibetan historical documents have suffered irreversible damage and degradation,because of the erosion of years and improper preservation.There is an urgent need to develop a system for Tibetan historical documents recognition to convert the image of Tibetan historical documents document into editable document,with the reason that the image of Tibetan historical documents cannot be edited again,the retrieval efficiency is low and the analysis and mining is difficult,which not only helps to speed up the process of digital protection of Tibetan historical documents,and provides convenience for relevant Tibetan researchers,but also promotes the exchange and integration of cultures among different nationalities.This paper takes the Uchen Script Tibetan ancient documents Ganzhur as the research object,and studies the document recognition of Tibetan ancient documents.According to the characteristics of Tibetan ancient document images,the following algorithms are proposed:(1)Binarization algorithm based on stain removal in Lab color space,which can eliminate the influence of stain and other unfavorable factors,and has a good effect on low-quality image processing;(2)The edge removal algorithm based on shape judgment of connected domain,which overcomes the shortcoming of easy misjudgment based on area of connected domain,and can judge and remove the edge area of image more accurately;(3)Character segmentation algorithm based on baseline segmentation,which solves the problem of adhesion caused by vowels,and further improves the accuracy of Character segmentation;(4)Character recognition algorithm based on CNN,which solve the problem of multi-class character recognition by deep learning,and the recognition rate of 7240 categories character in Tibetan ancient books is improved.In specific implementation aspects,this paper designs and implements a system for Tibetan historical document image recognition based on Windows platform that includes the basic functions of image binarization and proofreading,border removal and proofreading,row segmentation and proofreading,word segmentation andproofreading,recognition and proofreading,sample marking and proofreading,which can convert the Tibetan historical document image into editable text document.In order to meet the different needs,the system provides two main entrances: the "simple version" and the "professional version"."Simple Edition" is designed for ordinary users,which divided into three functional modules:(1)Image acquisition,which contains local acquisition and document scanning;(2)Image processing,which contains step-by-step recognition,single-click recognition and multiple-click recognition.And the method can be choosed by users;(3)Sample marker that classifying and saving character image according to the recognition result."Professional Edition" is designed for Tibetan Ancient historical document Researchers,which add algorithm replacement and module addition on the basis of the "simplified version" to meet the needs of scientific research.Tibetan historical document recognition system,which main frame is programmed by MFC and the implementation of function modules is implemented by calling EXE file.And there is no direct connection between function modules,which facilitates the modification and maintenance of system module functions.In addition,the system supports four types of EXE files contains Python,MATLAB,C++,MFC.Only matching interface functions,EXE files can be invoked to replace corresponding functional algorithms.After testing and verifying the whole system and the function modules the recognition system is running normally and stably.
Keywords/Search Tags:Tibetan historical documents, Recognition system, Document image processing, Picture and text proofreading
PDF Full Text Request
Related items