Font Size: a A A

The Study Of Segmentation Algorithm For The Image Of Palm Leaf Manuscripts

Posted on:2018-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:G PengFull Text:PDF
GTID:2428330518458659Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Palm leaf manuscripts which referred to the Buddhist classics inscribed on the Palm leaves had long been in common usage on the Dai national minority populated areas of China over one thousand years.As the palm leaf manuscripts getting on the verge of disappearance,to settle down to preservation is urgent.It is an excellent but inefficient way to enter it manually into digital memory.With the development of artificial intelligence,a more efficient way to delegate cumbersome work to tireless computer would displace the manual way.OCR algorithm working on translate the character on image into encoded text is a good way to solve the problem.The image preprocessing and the text line segmentation processing are key steps in an optical character recognition(OCR)system.This paper come up with a character segmentation algorithm for palm leaf manuscripts to prepare for the future job,character recognition for palm leaf manuscripts.The palm leaf manuscripts is commonly with poor quality and smudges,creases,stroke deformation and character touching.To design the palm-leaf manuscripts recognition system with the traditional OCR algorithm will not work.To tackle the problem,this paper come up with a character segmentation algorithm for palm leaf manuscripts which is composed of two mainly steps:the palm leaf manuscripts preprocessing,the palm leaf manuscripts segmentation.(1)Preprocessing algorithm as the start of the whole task includes image denoise,image binarization,skew correction and so on.The image binarization is a key step in during the image preprocessing.Based on the previous study and combined with characteristic of the palm leaf manuscripts,this paper come up with a binarization algorithm combined connected domain algorithm with character edge maps,and get a better results than others.(2)The text line segmentation process is a key step in an OCR system.Several common approaches,such as projection-based methods and stochastic methods,have been put forward to fulfill this task.However,most of existing methods cannot be directly applied to process the palm leaf manuscripts of Dai which the images have poor quality and include smudges,creases,stroke deformation and character touching.To solve this problem,an improved Viterbi algorithm based on Hidden Markov Model(HMM)is proposed to find all possible line segmentation paths firstly.And then,a path filtering method is used to detect the optimal paths for the segmented text blocks.The performance of the method is compared with relevant methods and the experimental results demonstrate the effectiveness of the proposed method.(3)According to the performance of the text line of palm leaf manuscripts,an algorithm,based on Viterbi algorithm and image stroke concave spot detection algorithm is proposed to solve the character segmentation problem of the text line of palm leaf manuscripts.The results of the experiment is pretty good,but there are some other problem during the research which needs to be work on in the future.This paper lay a foundation for the character recognition for palm leaf manuscripts.Actually,the research that have been done is not only the character segmentation algorithm but the character recognition algorithm.For the reason that the results of character recognition algorithm are not really good,it would not be discussed too much in this paper.
Keywords/Search Tags:Palm leaf manuscripts, Image binarization, Character image segmentation, Local projection segmentation algorithm, Viterbi algorithm, Concave spot detection
PDF Full Text Request
Related items