| In recent years,with the development of information networking and the growth of self media population,There is a surge in miswritten information on the Internet,Some analyses show that the text error rate in the headlines and text contents of Chinese online news has exceeded 1%,These misinformation may mislead people to make wrong decisions,Causing bad social impact,The traditional solution is to manually find and correct errors in articles,But manual proofreading is very expensive,A typical 5000 word article may take 1.5 to 2 hours to complete error checking and correction.This process is time-consuming and boring for proofreaders,Therefore,in recent years,many tasks related to text content urgently need the function of automatic text error correction,so as to help improve the user experience.For example,in the search scenario,the search engine must first detect and correct the query input by the user,and then search the query to return the results;In the scene of speech interaction,the speech system needs to convert the user’s speech into correct text,and then carry out subsequent intention recognition and interaction.In addition,text error correction technology can also be used in the research of deep question answering,dialogue system,information extraction and other fields,which is a crucial technology.Under this research background,this thesis puts forward the topic of "Research on Text Error Correction Algorithm Based on Deep Learning",which combines the technology of deep learning field to realize the recognition and correction of various wrong texts and complete the research of this topic.Aiming at the problem of Chinese text error correction,this thesis proposes a Chinese text error correction method based on BERT-Bi LSTM-CRF model.Firstly,BERT pre-training model is used to generate deep bidirectional language representation vectors fused with context information;Then,Bi LSTM model is used to learn the dependency relationship on the observation text;Finally,considering the error types of substitution,insertion,deletion and change of order in text error correction,conditional random field(CRF)algorithm is added to model the dependency relationship between adjacent texts.In CRF layer,Viterbi algorithm is used to decode the sequence predicted by the model,and dynamic programming algorithm is used to find the optimal path,so as to obtain the global optimal text sequence.In addition,considering that most characters in the observed text need not be modified and are easy to predict,the focus should be placed in the wrong position,so Focus Loss is introduced to solve this problem.Experiments on SIGHAN15 data set and Hybird Set data set show that our method is superior in error detection and correction tasks in terms of sentence level accuracy,precision rate,recall rate and F1 value. |