| The automatic proofreading of text is one of the basic works of natural language processing,which aims to analyze and correct the errors in the text by computer so that the computer can automatically restore the wrong text to the correct text.Spell checking is a kind of first-choice technology in text proofreading,which can quickly detect errors in the text by computer and improve the efficiency of text proofreading.The spell checking technology of English and Chinese texts has achieved fruitful results and has been widely used in various word processing software.Compared with the spelling check technology of English and Chinese texts,the spelling check technology of Tibetan texts is still in its infancy,Tibetan text spelling checking technology is still in its infancy,and its research has a wide range of applications in Tibetan corpus construction,speech recognition,text recognition,and many other aspects.Based on the spell-checking technology of English and Chinese texts,this paper analyzes the types of errors in Tibetan texts and the current research status of Tibetan spell checking and proposes a Tibetan word spell checking method based on the TC_LSTM(Tibetan Characters LSTM,TC_LSTM)language model.The main contents include:(1)Established experimental corpusTibetan language currently does not have a unified language model training and testing experimental corpus,therefore,we use reptile technology to obtain 186 mb Tibetan text corpus,including 15147315 syllables,from the Tibetan website,and preprocess it to obtain high-quality experimental corpus.(2)Construction of TC_LSTM language modelTibetan is a sequence of words.There are clear segmentation marks between words,but there is no segmentation mark between words.There are still many problems with word segmentation in Tibetan text.Therefore,a TC_LSTM language model with the word as the input unit is proposed.And its effectiveness is verified by experiments.Experiments show that the perplexity of the TC_LSTM language model on the test set is reduced by 74 and 18 compared with the traditional Bigram and Trigram language models,respectively,and its effect is significantly improved compared to the Bigram and Trigram language models.(3)Design Tibetan spell check algorithmA Tibetan word spelling checking method based on the TC_LSTM language model is proposed,and a Tibetan word spelling checking algorithm based on the TC_LSTM language model is designed,and the effectiveness of the algorithm is experimentally verified.The results show that the highest accuracy,recall and F value of TC-LSTM are 97.20%,85.89% and 79.09% respectively.The highest accuracy rate,recall rate and F value are improved by 11.87%,3.46% and 1.85% respectively compared to Bigram.The performance of the Tibetan word spelling algorithm based on TC-LSTM is better than Bigram of the algorithm through experiments. |