Research On The Traditional Chinese Spelling Error Detection

Posted on:2017-02-26

Degree:Master

Type:Thesis

Country:China

Candidate:Y Wang

Full Text:PDF

GTID:2308330488997096

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Traditional Chinese spelling error detection is an important research subject in the field of Chinese language processing, as well as an important component of many natural language processing systems, including search engines, word processors and so on. Compared with some western languages such as English, Chinese spelling error detection is more complex because there is no word delimiter in Chinese texts, and the collocations of Chinese words and the grammatical are complicated.The Simplified Chinese spelling error detection has been researched earlier than Traditional Chinese spelling error detection. After years of research, three main techniques have been summarized to solve the problem of Simplified Chinese spelling error detection, including the method based on rule, the method based on statistics, and the method based on feature and learning. However, these methods are based on the Simplified Chinese corpus, and not suitable for a variety of spelling errors, so they can only be used as a reference to the research on Traditional Chinese spelling error detection. In recent years, with the development of the Traditional Chinese spelling error check evalution, the research on Traditional Chinese spelling error detection has becoming a hotspot in the field of Chinese language processing.In this dissertation we proposed three effective methods to solve the problem of Traditional Chinese spelling error detection on the basis of early researches.Firstly we presented an approach using statistic dictionaries based on n-gram segmentation. The approach was based on the n-gram information collected from the statistic dictionaries which is built by corpus. We proposed a statistic rule based algorithm to detect spelling errors as well.Secondly a method based on graph model and pos bi-gram model was introduced in this dissertation. In the processing of error detection, results of Chinese word segmentation and candidate words replacing are shown as a graph model. The pos bi-gram model was used to help determine the final error word.Finally we proposed an automatic approach by utilizing POS-based language model to detect the errors of "De, Di, De"(â€œçš„â€, â€œåœ°â€, â€œå¾—â€). The approach can construct the statistical model of the context, and use this model to detection "De, Di, De" of spelling errors in text..The experimental results show that the three methods we proposed above had make good performances and our work had make contributions to the research of Traditional Chinese spelling error detection.

Keywords/Search Tags:

Chinese language processing, spelling error detection, Chinese word segmentation, n-gram model, part-of-speech tagging

PDF Full Text Request

Related items

1	The Research Of Applying Conditional Random Fields To Chinese Word Segmentation And Part-Of-Speech Tagging
2	Word Segmentation And Pos Tagging In Chinese
3	Research On The Methods Of Automatic Correction Of Chinese Word Segmentation And Part-of-Speech Tagging
4	Research On Chinese Word Segmentation And Part-of-speech Tagging Based On Deep Learning Methods
5	Chinese Word Found Its Part Of Speech Tagging
6	BiLSTM And CNN Based Joint Model For Chinese Word Segmentation And Part-of-speech Tagging
7	Optimization And Implementation Of Chinese Spelling Error Detection And Correction Algorithm
8	Study On Disambiguation Algorithm For Chinese Word Segmentation
9	N-gram Technology Application Study In Computer Processing Of Chinese Language
10	Research On The Learning Of Integrating Chinese Word Segmentation With Part-of-Speech Tagging And Domain Adaption Approach