Font Size: a A A

A Validity Study On Automatic Scoring Of English Listening And Speaking Test In College Entrance Examination Of Guangdong Province

Posted on:2020-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q GuanFull Text:PDF
GTID:2415330590480423Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Since 2011,the Computer-based English Listening and Speaking Test(CELST)of Guangdong province,as a new form of oral test,has attracted more and more attention.The traditional human scoring belongs to the category of subjective evaluation and the scoring process largely depends on the subjective impression of the grader.The reliability of scoring is easily affected by various factors,so how to ensure the scoring quality becomes the primary problem.Nowadays,with the extensive application of computer technology to the scoring of large-scale examinations,computer-automated scoring(CAS)is used instead of human scoring,which is an effective way to improve scoring efficiency and reduce scoring cost.Therefore,it is important to study the reliability of automatic scoring and the difference between automatic scoring and human scoring.In the 2013 Guangdong English College Entrance Examination Listening Test score,the machine score was used as the reference for the test score for the first time.Based on the theoretical framework of test validity research,this paper uses the statistical methods of correlation analysis,consistency analysis and regression analysis to compare and analyze the differences between human score and computer automatic score according to the data of college entrance examination English listening and speaking in Guangdong Province in recent years.And this article explores the validity of computer scoring and specifically answers the following three questions:(1)In the English listening and speaking test scores of Guangdong college entrance examination,what is the overall correlation between automatic score and human score? How is the consistency?(2)What is the difference between automatic and human ratings for different types of tasks and different grades?(3)How does the selection of scoring features affect automatic scoring? Main research methods in this paper are as follow:(1)After ranking the overall candidates' grades in descending order,stratified random sampling was conducted to extract candidates from different grades of 6,000 candidates,including human and computer scores for different types of questions.(2)Using SPSS 22.0 to calculate correlation and consistency indicators: Pearson correlation coefficient r,exact-plus-adjacent agreement.(3)Establish a regression model with computer scores in the Part A-Reading Aloud task.In this model,the independent variables are different features,the dependent variable is computer score,and multiple regression analysis was performed on computer scores.According to the data analysis results of this paper,the main findings are as follows:(1)The correlation between the automatic scores and the human scores ranged from 0.868 to 0.963,and the agreement rate reached 96.65%,both of which have reached or even exceeded the international level of most English Speaking Automatic Scoring System.(2)In the process of scoring,the computer is more sensitive than humans,and display the differences in the speaking ability of the candidates more intuitively.The study found that the correlation between human scores and computer scores in lower grades is larger than that of the middle and high grades,but it is likely to be affected by the number of samples.(3)The role-playing task has the least difference in computer scoring and human scoring,and retelling task has the largest.The possible reason is related to the difficulty of the task itself,the measurement target to be examined,and the scoring standard.(4)For the score of the reading task,the pitch and intonation scores have the greatest influence on the score,followed by the accent and rhythm,and the speed of speech has the least influence on the score of the single sentence of the reading.
Keywords/Search Tags:oral English test, human scoring, automatic scoring, validity
PDF Full Text Request
Related items