Application Research Of Bi-LSTM-CRF Model In Chinese Grammar Error Diagnosis

Posted on:2020-10-25

Degree:Master

Type:Thesis

Country:China

Candidate:S Liu

Full Text:PDF

GTID:2415330578452713

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the increasing international status of China,Chinese learning has become more and more important for the development of international learners.The goal of the Chinese Grammar Error Diagnosis(CGED)task discussed in this paper is to develop a computer-assisted auxiliary tool.This tool can not only help foreign learners who use Chinese as a second language to learn Chinese better,but also relieve the pressure of teachers who teach Chinese.The aim of Chinese Grammatical Error Diagnosis research is to establish a model that can automatically detect the errors and their locations made by learners in the process of Chinese writing.In this study,errors can be divided into four categories;redundant words,missing words,bad word selection,disorder words.The difficulty of Chinese grammar error diagnosis research is that the task involves different levels of information in natural language processing,including lexical analysis and syntactic analysis of Chinese.Therefore,it is necessary to consider all aspects to assist in the judgment.In addition,Chinese contains a wealth of linguistic knowledge,and the grammatical representations are diversified.When judging whether a sentence contains errors and what types of errors,it is often necessary to introduce external knowledge.In view of this,this paper proposes to use pyltp for data preprocessing.The personalized word segmentation feature of pyltp is more suitable for this task.This is because the datasets for Chinese Grammatical Error Diagnosis mostly come from Chinese essays written by different foreign students,which involve many different topics.Personalized word segmentation can alleviate subject dependence to a certain extent.When facing the new topic,the user only needs to label a small amount of data,and personalized word segmentation will be incremental training based on the original data.In order to achieve both the use of the original subject data information,but also take into account the particularity of the target theme.In addition,this paper proposes to use Bidirectional Long Short-Term Memory Network(Bi-LSTM)to model,which can better use two-way context information to determine whether the sentence is wrong.On this basis,we regard Chinese Grammatical Error Diagnosis as a special Sequence Labeling task to solve.For Sequence Labeling,Conditional Random Field(CRF)model has better performance than traditional Hidden Markov Model(HMM)and Maximum Entropy Markov Model(MEMM),and Bi-LSTM model can also alleviate the shortcomings of artificial feature selection and difficulty in capturing long-distance context information dependence in CRF model.Therefore,this paper further proposes to combine Bi-LSTM with CRF model.Among them,Bi-LSTM is used to obtain long-distance information in two directions,and then provide information to the CRF model for sequence labeling.The experimental results on the task open standard evaluation data set show that the Bi-LSTM-CRF model proposed in this paper is more effective than the Bi-LSTM model or CRF model alone in Chinese Grammatical Error Diagnosis tasks.

Keywords/Search Tags:

Chinese Grammatical Error Diagnosis, Bidirectional Long Short-Term Memory Network, Conditional Random Field, Sequence Labeling

PDF Full Text Request

Related items

1	Research On Key Problems Of Russian-Chinese Military Speech-to-speech Translation Based On Sequence-to-Sequence
2	The Effect Of Visual Long-term Memory On Visual Short-term Memory
3	Research And Implementation Of Music Emotion Recognition Based On Multimodal Features Fusion
4	Music Short-term Memory Survey And Comparison Research
5	The Role Of Long-Term Memory In Interpreting
6	Short-Term Memory Of Young Patients With Major Depression And Their Self-Rating Of Short-Term Memory
7	Non-stationary Long Memory Signal Modeling And Forecasting
8	Memory In Interpreting:Working Mechanism And Improving Strategies
9	A Report On The Interpretation International Cooperation Projects
10	Long-term memory supports the retention, preservation, and prioritization of short-term memory