Research Into Chinese Text Automatic Proofreading Algorithm Based On Shape-similar And Its Prototype System

Posted on:2008-08-10

Degree:Master

Type:Thesis

Country:China

Candidate:Z H Yu

Full Text:PDF

GTID:2178360215479690

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Chinese text automatic proofreading is a part of foundation research of natural language processing and its aim is to design an interactive computer-aided proofreading system. With the rapid development of electronic publishing industry and the digitalization of office and daily word-processing affairs, people have to deal with a large number of electronic documents in short time since the 1990s. Errors are inevitable in editing, and electronic document proofreading tasks have increasingly become so heavier that Chinese text automatic proofreading becomes an urgency task.In recent years, many researchers have done a lot of research on Chinese word segmentation, dependency parsing, semantic analysis and the construction of Chinese text automatic proofreading prototypal system. However, due to the limitations of theoretical study of the Chinese language and the features of the Chinese language, the performance of these systems lags fairly far behind the actual demand of people.Different ways of text input lead to different errors. Chinese Pinyin input system or speech recognition may lead to pronunciation-similar errors while Chinese Wubi input system may lead to shape-similar errors.The OCR technology has gradually matured. Nowadays, the rate of the correct identification is up to 97% theoretically when the OCR system is used to recognize printed-text. With the increase of the rate of correct identification, more and more people will use OCR systems to input text. However, the recognition accuracy will be reduced when an OCR system is used to recognize the characters with gross distortion or the character-images with serious noise data. Since the main errors made by an OCR system are shape-similar errors, we should find a way to proofread these errors in the recognized documents.In this thesis, firstly, the research background of automatic proofreading is investigated. Secondly, according to the actual characteristic of Chinese, an algorithm based on shape-similar is proposed for proofreading the corresponding errors in Chinese documents after how human beings recognize Chinese character is fully analyzed.Implementation of the algorithm is the right simulation of the process in which people recognize the shape-similar errors. If the proofreading system detects that the current words have shape-similar errors according to the operation of the user, the operation sequence will be given as follows: searching for shape-similar characters in knowledge database, investigating shape-similar words and proposing the correcting candidate suggestion.MS Word is the most useful application in file editing and processing. In this thesis, we have accomplished an automatic proofreading prototype system in Visual Studio 2005 Tools for the 2007 Office System and finished some related experiments. This system can run in Word 2003 as add-ins.Experiments show that the algorithm can provide users with effective correcting candidate suggestion. The rationally designed interface of the experimental prototype system provides great convenience for users to proofread the text and improves users'efficiency.

Keywords/Search Tags:

Chinese text automatic proofreading, Office Development, shape-similar, correcting candidate suggestion, VSTO Development

PDF Full Text Request

Related items

1	Research On Automatic Generation Technology Of Chinese Text Proofreading Corpora
2	Research On Chinese Text Proofreading Algorithm Based On The Combination Of Statistical Features And Rules
3	The Research Of Chinese Automatic Question Answering And Proofreading Based On Deep Learning
4	Research And Implementation Of Chinese Text Automatic Proofreading Based On Deep Learning
5	Research In Chinese Text Proofreading Based On OCR
6	Study On The Method Of Automatic Proofreading Of Word-level Chinese Text
7	Research On Automatic Proofreading Method Of OCR Recognition Results
8	Natural Language Processing Of Chinese Text Automatic Proofreading
9	Research And Application Of Key Techniques In Chinese Text Proofreading
10	Design And Implementation Of Chinese Text Automatic Proofreading System