Font Size: a A A

A Study On Chinese Document Layout Analysis And Reconstruction

Posted on:2004-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2168360122961208Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
It is a meaningful work to transform existing printed documents into the digital form. In this way, not only the contents of the printed documents can be retrieved easily, but also their storage can be reduced to a great extent.Layout analysis, understanding and reconstruction are important problem when transforming paper documents to digital versions. This article researches the technology of above three aspects, based on an automatic document processing system which developed by us. A layout analysis method based on complexity of layout to choice strategy is proposed for processing all different documents. This method uses an effective top-down algorithm based on project profile to process simple layout, and uses a bottom-up algorithm based on fuzzy nearest neighbor connect-strength and line confidence with strong compatibility to process complex layout. We use the method based on rules to realize layout understanding. The layout reconstruction is completed through RTF and HTML formats. These algorithms with a Chinese character and form recognition engine were used to finish a complete system to automatically do document processing. The algorithms were proven be efficient and practical by experiment results and a practical operating system.
Keywords/Search Tags:Character recognition, Layout analysis, Layout understanding, Layout reconstruction, Skew correction
PDF Full Text Request
Related items