Font Size: a A A

Research On Chinese Grammatical Error Diagnosis Based On Transformer Model

Posted on:2022-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:J H ZhangFull Text:PDF
GTID:2505306755996009Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the diversification of international communication,more and more foreign learners are learning Chinese as a second language and a great number of demands for correction of Chinese grammar error have arisen.This paper conducts research on Chinese grammatical error diagnosis,and the goal of this paper is to develop a system for handling the task of Chinese grammatical error diagnosis that can handle four types of grammatical errors: word selection errors,word disorder,word redundancy and missing words.In this paper,two separate subsystems are designed to address the grammar detection and correction problems,as follows.1.Chinese grammatical error detection sub-systemThe subsystem proposes three models for detecting grammatical errors by integrating syntactic information,contextual information and lexical information.The first model extracts high-level semantic information of the text by BERT(Bidirectional Encoder Representations from Transformers)and fuses it with syntactic information extracted by GCN(Graph Convolutional Networks).In addition,the system adds LSTM(Long shortterm memory)to acquire the text information of long-term dependency.Moreover,it uses CRF(Conditional random fields)to reduce errors when labeling sequences.The second model has a similar structure to the first model,with the difference that it uses contextual information of the text instead of syntactic information.The third model uses GCN based on lexicon to solve the word ambiguity problem.To further strengthen the performance of the system,we use a three-step voting integration method to combine the prediction results of the three models.Compared to all other teams participating in the Chinese Grammatical Error Diagnosis 2020 task,our system achieves the highest F1 values at the detection level and identification level.2.Chinese grammatical error correction sub-systemThis subsystem corrects grammatical errors in sentences by both sequence generation and sequence tagging,based on the use of a wide range of semantic information about the text.It contains three models in total.The first and second one are sequence-to-sequence models,the former focuses on character-continuous contextual information while the latter focuses on important local information of the sentence.Given an incorrect sentence,the model directly outputs the correct sentence.The third one is the sequence tagging model,which generates edit tags for each character in the input sentence to generate the correct sentence,where the edit tag indicates the transformation operation that needs to be performed for the current character and contains two types of transformation information of the text: the basic transformation and g-transformation.In addition,we use a simple voting method to combine the prediction results of all models,and the one candidate answer which has the most votes is taken as the final result of the system.Compared to all other teams participating in the Chinese Grammatical Error Diagnosis2020 task,our system achieves the highest F1 values at both the TOP1 and TOP3 level,which indicate the correction ability.
Keywords/Search Tags:Chinese Grammatical Error Diagnosis, Transformer, GCN
PDF Full Text Request
Related items