| Chinese text automatic proofreading is a part of foundation research of natural language processing and its aim is to design an interactive computer-aided proofreading system. With the rapid development of electronic publishing industry and the digitalization of office and daily word-processing affairs, people have to deal with a large number of electronic documents in short time since the 1990s. Errors are inevitable in editing, and electronic document proofreading tasks have increasingly become so heavier that Chinese text automatic proofreading becomes an urgency task.In recent years, many researchers have done a lot of research on Chinese word segmentation, dependency parsing, semantic analysis and the construction of Chinese text automatic proofreading prototypal system. However, due to the limitations of theoretical study of the Chinese language and the features of the Chinese language, the performance of these systems lags fairly far behind the actual demand of people.Different ways of text input lead to different errors. Chinese Pinyin input system or speech recognition may lead to pronunciation-similar errors while Chinese Wubi input system may lead to shape-similar errors.The OCR technology has gradually matured. Nowadays, the rate of the correct identification is up to 97% theoretically when the OCR system is used to recognize printed-text. With the increase of the rate of correct identification, more and more people will use OCR systems to input text. However, the recognition accuracy will be reduced when an OCR system is used to recognize the characters with gross distortion or the character-images with serious noise data. Since the main errors made by an OCR system are shape-similar errors, we should find a way to proofread these errors in the recognized documents.In this thesis, firstly, the research background of automatic proofreading is investigated. Secondly, according to the actual characteristic of Chinese, an algorithm based on shape-similar is proposed for proofreading the corresponding errors in Chinese documents after how human beings recognize Chinese character is fully analyzed.Implementation of the algorithm is the right simulation of the process in which people recognize the shape-similar errors. If the proofreading system detects that the current words have shape-similar errors according to the operation of the user, the operation sequence will be given as follows: searching for shape-similar characters in knowledge database, investigating shape-similar words and proposing the correcting candidate suggestion.MS Word is the most useful application in file editing and processing. In this thesis, we have accomplished an automatic proofreading prototype system in Visual Studio 2005 Tools for the 2007 Office System and finished some related experiments. This system can run in Word 2003 as add-ins.Experiments show that the algorithm can provide users with effective correcting candidate suggestion. The rationally designed interface of the experimental prototype system provides great convenience for users to proofread the text and improves users'efficiency. |