| Automatic Readability Assessment(ARA)has garnered significant attention as a burgeoning field of research.Its core objective is to automatically evaluate the readability of text by leveraging diverse text features.While traditional ARA methods often consider single or partial features,this thesis proposes a novel approach based on multi-feature extraction.Furthermore,a comprehensive corpus collection comprising multiple corpora,including an analysis of their sources and statistical characteristics,is constructed.From three dimensions,namely structural features,word frequency features,and deep features,the difficulty characteristics of the corpora are extracted,elucidating the distribution of language structural feature attributes across different corpora.To address the limitations of manual feature extraction in traditional ARA methods,this thesis introduces a hierarchical network that incorporates pre-trained language models.By leveraging the features extracted from pre-trained language models and the architecture of the hierarchical network,this approach circumvents the reliance on manual feature engineering,resulting in enhanced prediction accuracy and practicality.Experimental results reveal that the deep features outperform other feature groups in representing text difficulty.In addition,this study compares and contrasts the representations from different layers of the pre-trained model,ultimately selecting the last layer as the optimal difficulty representation for text.Remarkably,the proposed model achieves remarkable accuracy rates of 89.76%,85.32%,and 51.56% on three publicly available corpora,surpassing the performance of baseline models such as convolutional neural networks and long short-term memory networks.Finally,this thesis redefines the ARA task by reframing it as a difficulty ranking problem using pairwise ranking methods.The results of the experiments demonstrate a robust correlation in the consistency of difficulty ranking across corpora,highlighting the efficacy of pairwise ranking methods in capturing the consistency of difficulty across different corpora.This finding further underscores the transfer learning ability of the proposed ranking model in diverse corpora. |