| Essays are perceived as the most powerful tool for assessing language learning outcomes for both second language and English as Foreign Language(EFL)learners.With the flourishing development of computational linguistics,natural language processing(NLP),and deep learning technology,AES models are rapidly evolving by addressing the interpretability and generalizability in model construction.However,there is still an urgency for improvement in the existing studies.(1)Traditional feature-based approaches:The vast majority of research has placed great emphasis on exploring linguistic predictors for the overall writing quality of argumentation as it is the most common genre type in the large-scale writing assessment.Relatively little research has been conducted to examine linguistic predictors of different genres at holistic and analytic rating traits.Besides,scholars have doubted that existing AES systems designed for native English speakers are not well suited for Chinese EFL learners as linguistic features predicting for writing quality of different genres and rating traits tend to be different from native English speakers.Furthermore,scholars have resorted to linear regression methods in model construction,which allows limited handcrafted linguistic features to fit into the model,resulting in low generalizability and performance of the model.(2)Neural network approaches:AES models based on supervised learning usually require large quantities of annotated data,which is not always accessible because less publicly available corpora are annotated with scores of different rating traits.With the advances in pre-trained language models and multi-task transfer learning,the generalizability and performance can be greatly improved through the general language representations based on large corpora and the underlying shared representation of similar tasks.However,relatively little research has exploited the effectiveness of pre-trained language models in the multi-dimensional essay scoring model construction.(3)Hybrid approaches:Although the model performance and generalizability have been significantly improved with neural network approaches,the inner workings of linguistic features in the deep learning black box are still unknown.Scholars have tried to combine the strengths of traditional feature-based approaches and neural network approaches to construct AES models.However,only a small range of surface linguistic features are included.Besides,linguistic features concerning different constructs of writing quality are being ignored.Focusing on the problems and arguments above,the present study proposes a hybrid approach by incorporating the pre-trained language model Bidirectional Encoder Representations from Transformers(BERT)and fine-grained linguistic features into multi-dimensional essay scoring model construction for Chinese EFL learners.By integrating the advantages of traditional feature-based approaches and neural network approaches,the interpretability and generalizability of the multidimensional essay scoring model are greatly improved.Firstly,with cutting-edge computational linguistic tools,a great number of fine-grained linguistic features are extracted for different constructs of Chinese EFL learners’writing quality.Linguistic features of each rating trait are optimized through standardization,normal distribution detection,multicollinearity tests and principal component analysis.Fine-grained linguistic features for the Task Achievement/Response Trait,the Coherence and Cohesion Trait,the Lexical Resource Trait,the Grammatical Range and Accuracy Trait and the Holistic Trait are included to explore their correlation with the Chinese EFL learners’ writing quality of different genres including argumentation,exposition and narration.With the qualified linguistic features,both linear and non-linear algorithms are used to construct AES models with the traditional feature-based approaches.The non-linear AES model with the Random Forest Regression outperforms others and shows its strength for different genres and rating traits with the highest exact agreements.Secondly,with the pre-trained language model BERT and the multi-task transfer learning approach,multi-trait and multi-genre essay scoring tasks are explored.Compared with the traditional feature-based approaches,the neural network approaches enhance the generalizability of the target domain through multi-task transfer learning and avoid the problem of information loss caused by manual feature extraction.Results indicate that BERT-MTL-finetune outperforms other baselines with the highest exact agreements ranging from 82.7%to 91.2%for different rating traits.Overall,BERT-MTL-finetune brings benefits to multi-trait scoring tasks through the underlying shared representation of each rating trait compared with single-trait scoring tasks and enhances the generalizability and performance in multi-genre scoring tasks through the increased training sample size and the underlying shared representation of each genre compared with the singlegenre scoring task.Finally,to address the strengths of high interpretability and generalizability,a hybrid approach incorporating the BERT-based transfer learning approach and fine-grained linguistic features is proposed to construct a multidimensional essay scoring model for Chinese EFL learners.For multi-dimensional essay scoring tasks,results indicate that BERT-MTL-finetune+Features outperforms BERT-MTL-finetune with the highest exact agreements ranging from 91.2%to 96.9%.The exact-plus-adj acent agreements of BERT-MTLfinetune+Features reach 100%for all rating traits.Results prove that BERT-MTLfinetune+Features brings benefits to multi-dimensional essay scoring through the increased training sample size and the underlying shared representation of different genres and rating traits,thereby enhancing the generalizability and performance of the proposed AES model.More importantly,results indicate that fine-grained linguistic features are valid for predicting Chinese EFL learners’ writing quality of different genres.As for contributions of linguistic features of each trait to the holistic trait,results indicate that the absence of linguistic features of the Task Achievement/Response Trait has the greatest impact on the performance of the proposed model.The absence of linguistic features of the Lexical Resource Trait has the lowest impact on the performance of the proposed model.Besides,the comparison with AES systems designed for Chinese EFL learners is explored to comprehensively understand the performance of the proposed models.Results indicate that the proposed AES models are valid for evaluating Chinese EFL learners’ writing quality of different genres at holistic and analytic rating traits.The present study incorporates the strengths of traditional feature-based approaches and neural network approaches to construct multi-dimensional essay scoring models for Chinese EFL learners.Findings offer new perspectives for studies on unveiling the inner workings of linguistic features in the deep learning black box and exploiting the pre-trained language models for multi-dimensional essay scoring tasks.Besides,findings are of theoretical significance for the research on linguistic predictors for Chinese EFL learners’writing quality of different genres at holistic and analytic rating traits.Furthermore,EFL teachers can carry out diversified and individualized teaching according to the feedback of linguistic features predicting the writing quality of different genres and rating traits.Last,findings are of essential value for corpus linguistics as they can offer suggestions for constructing Chinese EFL learners’ writing corpus with multi-dimensional rating traits. |