Research On Text Error Correction Algorithm Based On Deep Learning

Posted on:2024-09-11

Degree:Master

Type:Thesis

Country:China

Candidate:F F Gu

Full Text:PDF

GTID:2568307112958359

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,with the development of information networking and the growth of self media population,There is a surge in miswritten information on the Internet,Some analyses show that the text error rate in the headlines and text contents of Chinese online news has exceeded 1%,These misinformation may mislead people to make wrong decisions,Causing bad social impact,The traditional solution is to manually find and correct errors in articles,But manual proofreading is very expensive,A typical 5000 word article may take 1.5 to 2 hours to complete error checking and correction.This process is time-consuming and boring for proofreaders,Therefore,in recent years,many tasks related to text content urgently need the function of automatic text error correction,so as to help improve the user experience.For example,in the search scenario,the search engine must first detect and correct the query input by the user,and then search the query to return the results;In the scene of speech interaction,the speech system needs to convert the user’s speech into correct text,and then carry out subsequent intention recognition and interaction.In addition,text error correction technology can also be used in the research of deep question answering,dialogue system,information extraction and other fields,which is a crucial technology.Under this research background,this thesis puts forward the topic of "Research on Text Error Correction Algorithm Based on Deep Learning",which combines the technology of deep learning field to realize the recognition and correction of various wrong texts and complete the research of this topic.Aiming at the problem of Chinese text error correction,this thesis proposes a Chinese text error correction method based on BERT-Bi LSTM-CRF model.Firstly,BERT pre-training model is used to generate deep bidirectional language representation vectors fused with context information;Then,Bi LSTM model is used to learn the dependency relationship on the observation text;Finally,considering the error types of substitution,insertion,deletion and change of order in text error correction,conditional random field(CRF)algorithm is added to model the dependency relationship between adjacent texts.In CRF layer,Viterbi algorithm is used to decode the sequence predicted by the model,and dynamic programming algorithm is used to find the optimal path,so as to obtain the global optimal text sequence.In addition,considering that most characters in the observed text need not be modified and are easy to predict,the focus should be placed in the wrong position,so Focus Loss is introduced to solve this problem.Experiments on SIGHAN15 data set and Hybird Set data set show that our method is superior in error detection and correction tasks in terms of sentence level accuracy,precision rate,recall rate and F1 value.

Keywords/Search Tags:

Error correction of chinese text, Deep learning, BERT, BiLSTM, CRF

PDF Full Text Request

Related items

1	Research On Chinese Text Error Correction Method Based On Deep Learning
2	Chinese Picture Text Extraction And Error Correction Based On Deep Learning
3	Research On Error Correction Method Of Chinese Short Text Based On BERT
4	Research On Chinese News Classification Algorithm Based On Deep Learning
5	Research On Chinese Spelling Error Correction Model Based On Deep Learning
6	Research On Optimization Of Chinese Text Error Correction Algorithm
7	Research On Chinese Grammar Error Correction Based On Deep Learning
8	Research And Application Of Chinese Text Error Correction Methods For Various Error Type
9	BERT-based Text Error Correction Model For Normative Documents
10	Research On Bad Microblog Text Classification Based On Deep Learning