Font Size: a A A

Research And Design Of The English Essay Similarity Detection System For Chinese College Students

Posted on:2018-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WangFull Text:PDF
GTID:2335330515996089Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of natural language technology,more and more colleges seek to use technology facilities to improve their teaching efficiency.In such condition,automatic grading technology of English composition appears.In china,there has been a number of automatic grading systems,but the similarity detection algorithm in these systems is superficial.In foreign countries,researches about similarity detection mainly focus on long texts such as papers and codes.Therefore,the goal of this paper is to improve the similarity detection algorithm and develop a more suitable similarity detection system for colleges.In order to achieve this goal,firstly this paper conduct a research on the characteristics of English composition.Secondly,according to these characteristics,English compositions are divided into different types.For the long compositions that have 60 or more than 60 words,the paper designs a similarity detection algorithm based on WordNet semantic clustering,improving the TCUSS clustering algorithm.For the short compositions that have less than 60 words,the paper designs a similarity detection algorithm based on stop words,after verifying the stability of stop words in English.Thirdly,this paper collects a number of corpus samples.The results of the two algorithms and the whole similarity detection algorithm are verified by these samples.After comparing these results with the result of K-means algorithm,we come to the conclusion that the new algorithm we design is superior to the K-means algorithm.Finally,based on the new algorithm,this paper designs the similarity detection subsystem in the computer-aided review system.The paper presents a similarity detection algorithm of English composition.After verification,the correct rate of the whole algorithm,the recall rate and the degree of F1 are better than that of the commonly used similarity detection algorithm.Furthermore,the paper takes an asynchronous approach to design the subsystem,and in this way,the computer-aided review system can meet the demand of large-scale use.
Keywords/Search Tags:Composition Scoring, Similarity Detection, Stop Words, Text Clustering, Semantic Information
PDF Full Text Request
Related items