Font Size: a A A

Research On Correction Model For Spelling And Grammar Errors In College English Essays

Posted on:2012-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z HuangFull Text:PDF
GTID:2235330338493130Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Natural language processing (NLP) technology is the theory and method which focuson the research and implementation of effective natural language communication betweenhuman and computer. In recent years, with the development of the computer sciencetechnology and efficient application in statistical learning method, NLP has become animportant research direction in artificial intelligence and the field of semantic search.Considering the information processing in English essay, if the semantic elements (wordsand sentences) within the essay exist errors, it will inevitably produce a negative impact onthe further text analysis and semantic comprehension and eventually the overallperformance of the implemented system will degraded. Therefore, to detect and correct theerrors existing in spelling and grammar intelligently is one of the priorities and difficultiesin NLP.In this dissertation, the main research object is the college English essays and theerrors phenomena appeared in misspelled words and ungrammatical sentences are analyzed.Via deeply exploring the mathematical statistic models and technical solutions involved inintelligent correction and also, weighting the advantages and drawbacks in differenttechnology roadmaps and the difficulties faced in actual implement, the system which iscapable of correcting the spelling and grammar errors automatically in English essay hasbeen realized.The achievements of the research in this dissertation can be sketched out in thefollowing two aspects:1. With respect to the detection and correction of the misspelled words, these fourtypes of errors insertion, deletion, substitution and transposition among letters are comeinto research. It pays more attention to non-word errors resulted from word pronounciationconfusion and the correction issues arose from word forms diversification (e.g.abbreviation, hyphenated compound word, proper noun, etc.). Besides, as to real-worderrors, the machine learning method is adopted to extract the contextual semantic featuresin corpus and the real-word correction model is generated by means of traning. Throughtaking advantage of the candidate recommendation information in non-word errorschecking, a predicted algorithm of optimal combination based upon recommendedcandidate list is proposed and the experiment results manifest that the rate of the accuracycan reach 83.78% when the strategy applied into the real-word errors correction involvedin context misspelling. 2. With regard to the detection and correction of the ungrammatical sentence, thispaper combines the advantages of the grammar rules and statistical model on the basis ofcontextual information in the text training set so as to analyze and research on thepreposition errors, imcomplete sentence element, inconsistent with the singular and pluralnouns, word part of speech confusion, subject and verb disagreement, modal (auxiliary)verb misuse, etc. It involves diverse aspects of the NLP technologies which are sentenceboundary disambiguation, word part of speech tagging, name entity recognition, contextinformation extraction, etc. Concerning the experiment result on the tested English essayswhich have the similar level of difficulty with the CET4~6 writing, the approach presentedin this paper is effective for sentence grammar errors detection and correction.
Keywords/Search Tags:English essay checking, word spell checking, sentence grammarchekcing, N-gram disambiguation model, contextual semantic analysis
PDF Full Text Request
Related items