Research On Chinese Grammatical Error Correction Based On Sequence Generation Models

Posted on:2024-08-18

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Liang

Full Text:PDF

GTID:2558307067968329

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Grammatical error correction is one of the most challenging tasks in NLP,which aims to automatically detect and correct all errors in sentences using NLP techniques.It is valuable in theory and practice.In the Chinese grammatical error correction research,there is still much room for improvement due to the limitations of both data and algorithms.Sentences usually contain multiple types of grammatical errors,and the semantic dependencies between multiple errors make it impossible for a single model to completely correct the errors in sentences through a single reasoning.In this thesis,we adopt the data augmentation and model ensemble strategies to construct the Chinese grammatical error correction model,which solves the problems of poor robustness and weak generalization ability of data-driven grammatical error correction models.The main research content is as follows:（1）Construct a Chinese grammatical error correction model based on knowledge and fluency enhancement.Aiming at the lack of generalization ability and the incomplete error correction of single reasoning of the Transformer model,the model introduces the pre-trained language model into the Transformer model and maximizes the use of pre-trained knowledge to improve the error correction effect.On this basis,we adopt the fluency promotion mechanism to achieve better results by making the model correct the wrong sentences multiple times.The experimental results show that the performance has improved by 2.14%,which is better than the best-performing system in the NLPCC2018 grammatical error correction shared task.（2）To solve the shortage of corpus,we use two data augmentation methods to synthesize pseudo data and expand the error correction data set.We design and implement data augmentation methods based on rules and back-translation methods.The rule-based method inserts,deletes,and replaces specific words,and converts the order of the words in the correct sentence to construct the wrong sentences,and the back translation method reverses the training corpus and trains the error generation model to construct the wrong sentences.The experimental results show that the expanded data set has effectively supplemented the research objects,and the F_0.5 value has achieved a 2.88%improvement.（3）Adopt the recurrent generation method to build a grammatical error correction model,and integrate the advantages of heterogeneous models to correct various grammatical errors in sentences.Aiming at the problem that a single error correction model cannot completely correct sentences containing multiple errors,adopt the recurrent generation method to integrate heterogeneous models.And apply the protection mechanism in the recurrent generation method,which effectively avoids the introduction of unnecessary errors in the error correction process.The results show that the proposed method can effectively correct various errors in sentences,and the performance of the model has been further improved by 2.5%.

Keywords/Search Tags:

Chinese grammatical error correction, sequence generation model, pre-trained language model, data augmentation, model ensemble

PDF Full Text Request

Related items

1	Chinese Grammatical Error Correction Based On Knowledge Graph
2	Research On Chinese Grammatical Error Correction Based On Sequence-to-Sequence Model
3	Research And Implementation Of Grammar Error Correction Model Based On Deep Learning
4	Research On Error Correction Method Of Chinese Short Text Based On BERT
5	Research On Grammatical Error Correction Based On Deep Learning
6	OCR Error Post-correction Based On Chinese Character-level Features And Language Model
7	Research And Implementation Of Grammatical Error Correction Based On Recurrent Neural Network
8	Research On Chinese Text Summary Generation Based On Pre-trained Language Model
9	Research And Implementation Of Automatic Correction Model For Grammatical Errors In Chinese Long Text
10	Research On Deep Learning Error Correction Method Of Chinese Text