Font Size: a A A

Research And Application Of Document Image Paragraph Segmentation

Posted on:2011-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:N ZhaoFull Text:PDF
GTID:2178360308965083Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of computer technology, the estrangement capability of computer has been enlarged many times, by kinds of input digital device, more and more documents are stored into computer and saved as bitmap form. There has been growing interest to the technology of convert these document images into a retrievable and editable form. For all these tasks, document image analysis comes in to being.Document segmentation is the major and basic technology in document processing and computer vision. Document segmentation got great academic and practical significance. The result of segmentation is better or not influence the following recognition and interpretation strongly. Therefore, many documentation segmentation methods have been developed and got successful in machine printed documents, processing of handwritten documents has still remains an open research field. Until now, the universal method to process all kinds of pictures has not being proposed. Most current documentation segmentation methods are based on that the document images are reasonably straight. Some segmentation approaches are depended on special language. Most proposes are sensitive to the topological changes of handwritten documents. Geometrical methods based on active contour model is not.Active contour model is a top-down processing with prior knowledge and provides a theoretically uniform frame work to a series of problems, such as contour extraction, stereo matching and object tracking. So the method has been successfully applied to image segmentation, medical image processing, human-computer interaction and many other research and practical fields. Level set methods which are based on Mumford-Shah model are excellent and important methods which are based on deformable model. Because of depending on global information of homogeneous regions in the image, they segment the images much more quickly and precisely.The paper introduces the background of active contour model. And then illuminates the foundation of the level set method and the image segmentation based on Mumford-Shah model. According to the characteristic of document, the author propose that the piecewise constant approximation of the Mumford-Shah model is very appropriate for the paragraph segmentation and text line segmentation. And the traditional level set methods must re-initialize level set functions costly so that level set functions can be closet to Signed Distance Function and image can be effectively segmented. But in order to be close to a Signed Distance Function, the time step must be small, and the evolution procedure is slowed down. The thesis introducing the Chunming Li'method of level set without re-initialization into them. The experiments indicate that the typically edges of our sample image will be picked up only no more than 10 iterations by using the proposed method. The segmentation tests for kinds of handwritten documents proved that the proposed method is very quick and universal.
Keywords/Search Tags:document image segmentation, paragraph segmentation, text line segmentation, Mumford-Shah model, the level set
PDF Full Text Request
Related items