| With the improvement of national cultural soft power,there are more and more learners who use Chinese as a second language.And the number of texts that need to be processed increases accordingly in teaching Chinese as a foreign language.Automatic text understanding can reduce text processing work.In the teaching of the four language skills of listening,speaking,reading,and writing,writing and reading are the difficulties for Chinese second language learners.Because they need to understand the text more standardized and deeper.Aiming at the two teaching difficulties of writing grammar and text reading,Automatic grammar checking and obtaining reading comprehension answers and parsing as application goals,this thesis studies Chinese Grammar Errors Diagnosis and Extractive Machine Reading Comprehension.The main works are as follows:(1)Chinese Grammar Errors Diagnosis based on RoBERTa model.It is difficult to identify errors in Chinese Grammar Errors Diagnosis.This thesis proposes a RoBERTa-BiLSTM-CRF model.RoBERTa converts sentences to word vectors,BiLSTM learns the semantic features in the wrong sentence further,and the CRF layer predicts the optimal label sequence based on the entire feature.In Detection,Identification,and Position subtask.The F1 value of the RoBERTa-BiLSTM-CRF model is 1.88%,4.19%,and 4.65% higher than the baseline model BERT-CRF.The experimental comparative analysis shows the effectiveness of the RoBERTaBiLSTM-CRF model to automatically identify grammatical errors.(2)Extractive reading comprehension based on adversarial training.Because of lacking extractive reading comprehension data sets for teaching Chinese as a foreign language.This thesis starts with the test texts of the Hanyu Shuiping Kaoshi.Combined with the characteristics of reading comprehension texts in HSK,a finegrained annotation system is established.Through manual labeling,an extractive reading comprehension dataset HSKReader is constructed,and the labeling consistency reaches 87.52%.HSKReader contains 1379(chapter,question,answer)triple data.It provides a data basis for the research on extractive reading comprehension for teaching Chinese as a foreign language.Aiming at the problem of interfering answers in a chapter,adversarial training based on RoBERTa model is proposed.The fast gradient method and projected gradient descent adversarial training method generate disturbances,which are added to the word vector.adjust the parameters of the RoBERTa model,and predict the starting and ending positions of the answer in the text.Adjust the parameters of the RoBERTa model during training to predict the start and end positions of the answer in the chapter.After adding the adversarial training method,the EM and F1 values of the RoBERTa model on the HSKReader,CMRC2018,and PDu Readerchecklist datasets are improved by 4.78%and 4.51%,0.54% and 0.76%,1.49% and 0.42%,respectively.The experimental comparative analysis shows the effectiveness of the adversarial training method in the extractive reading comprehension task.(3)The construction of the assisted learning platform for Chinese as a foreign language.Based on the RoBERTa-BiLSTM-CRF model and the RoBERTa-based adversarial training method,this thesis develops an assisted learning platform for Chinese as a foreign language named Han WR.Han WR includes grammar checks,reading comprehension exercises,error-prone word learning,and extended reading modules.It helps Chinese as a second language learners to write and read,and improve their Chinese text comprehension. |