| Grammatical error correction is one of the most challenging tasks in NLP,which aims to automatically detect and correct all errors in sentences using NLP techniques.It is valuable in theory and practice.In the Chinese grammatical error correction research,there is still much room for improvement due to the limitations of both data and algorithms.Sentences usually contain multiple types of grammatical errors,and the semantic dependencies between multiple errors make it impossible for a single model to completely correct the errors in sentences through a single reasoning.In this thesis,we adopt the data augmentation and model ensemble strategies to construct the Chinese grammatical error correction model,which solves the problems of poor robustness and weak generalization ability of data-driven grammatical error correction models.The main research content is as follows:(1)Construct a Chinese grammatical error correction model based on knowledge and fluency enhancement.Aiming at the lack of generalization ability and the incomplete error correction of single reasoning of the Transformer model,the model introduces the pre-trained language model into the Transformer model and maximizes the use of pre-trained knowledge to improve the error correction effect.On this basis,we adopt the fluency promotion mechanism to achieve better results by making the model correct the wrong sentences multiple times.The experimental results show that the performance has improved by 2.14%,which is better than the best-performing system in the NLPCC2018 grammatical error correction shared task.(2)To solve the shortage of corpus,we use two data augmentation methods to synthesize pseudo data and expand the error correction data set.We design and implement data augmentation methods based on rules and back-translation methods.The rule-based method inserts,deletes,and replaces specific words,and converts the order of the words in the correct sentence to construct the wrong sentences,and the back translation method reverses the training corpus and trains the error generation model to construct the wrong sentences.The experimental results show that the expanded data set has effectively supplemented the research objects,and the F0.5 value has achieved a 2.88%improvement.(3)Adopt the recurrent generation method to build a grammatical error correction model,and integrate the advantages of heterogeneous models to correct various grammatical errors in sentences.Aiming at the problem that a single error correction model cannot completely correct sentences containing multiple errors,adopt the recurrent generation method to integrate heterogeneous models.And apply the protection mechanism in the recurrent generation method,which effectively avoids the introduction of unnecessary errors in the error correction process.The results show that the proposed method can effectively correct various errors in sentences,and the performance of the model has been further improved by 2.5%. |