Font Size: a A A

Research And Application Of Automatic Detection Of Chinglish

Posted on:2019-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:C YuFull Text:PDF
GTID:2405330596462901Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The phenomenon of Chinglish is common in the writing of Chinese students nowadays,which affects the accuracy and idiomaticity of the written expression.Clearly pointing out the type of Chinglish,and providing effective correction feedback for Chinese English learners can improve students' sensitivity to Chinglish and reduce errors.The purpose of this paper is to identify and detect the Chinglish in the English writing of the students through a rule-based approach and give them feedback.The ultimate goal is to improve the accuracy and idiomaticity of Chinese English learners' English writing,as well as the effectiveness of English teaching.Many studies on Chinglish are focused on the causes,case analyses and teaching suggestions,but few are conducted from the perspective of automatic discovery and recognition through the natural language processing techniques.In this article,the author explores the effective recognition and detection of Chinglish in three stages.The first stage is to classify,summarize and analyze Chinglish cases in books and papers from the perspective of automatic discovery and recognition.Language Tool is used to summarize the cases in the form of rules written in the XML language.The author tests,evaluates and improves the existing rule base via systematic testing methods,effectively improving the accuracy of the rule base.The second stage is to make full use of the results of previous error-prone research through the "inverted index" approach of Lucene Toolkit and the Stanford coreNLP natural language processing tool.The author also analyzes the deficiencies of the previous studies and some mistakes.Then after some corrections,the author applies the results to the Chinglish detection of the collocation type in the writing.In the third stage,in order to obtain more Chinglish rules and patterns,the author also uses large-scale marked learner corpora-the CLEC corpus and the NUCLE corpus to propose a method for acquiring Chinglish recognition rules,that is,the rules can be automatically generated via the scope of labeling and syntactic analysis.The author finally develops a Chinglish detection system based on the third stage of research.The system,which combines semi-automatic Chinglish rule generation with Chinglish auto-detection in students' English writings,can ensure high recognition accuracy and low false alarm rate and it greatly saves labor cost and time for hand-written rules.This tool helps automatically generate rules and manually modify rules.Users can use the BNC corpus and CLEC corpus to verify the validity of rules,generate a constantly enriched rule base,and use the recognition rules that have been confirmed valid to detect Chinglish in English Writing.The achievement of this study is to obtain a batch of Chinglish rules with high recognition accuracy and low false alarm rate in a scientific and effective way.The author has made full use of the existing Chinglish error-prone collocation rule tables and acquires a large number of regular patterns of collocational Chinglish from the marked learner corpus.A systematic process is designed and proposed.The integration and implementation of sub-functions are performed on the Eclipse platform.The research can promote the Chinese English learners' writing skills,and effectively reduce the English teachers?workload of correcting repetitive Chinglish errors in the English writing.
Keywords/Search Tags:Chinglish, error detecting, collocation error, feedback
PDF Full Text Request
Related items