Font Size: a A A

Research On Tibetan Automatic Proofreading Technology Based On Mutual Information

Posted on:2021-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:R B M CiFull Text:PDF
GTID:2415330611459660Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet,many electronic texts have appeared,and there are often many errors in the electronic texts,such as basic spelling errors,grammatical and semantic errors,which lead to publications in the publishing industry,announcements issued by the government and news media,and researcher data The textual information has a great impact.The traditional manual proofreading method has been unable to adapt to the rapidly growing number of electronic texts.Manual proofreading is not only cumbersome and complex,monotonous,labor intensive,low efficiency,high cost,and may appear twice Error,therefore,the study of text automatic proofreading has very important practical significance.In this thesis,through studying Tibetan linguistics,the basic knowledge of Tibetan spelling rules,case grammar,acronyms,semantics and other basic knowledge are studied.With reference to English and Chinese proofreading,a proofreading method that conforms to the Tibetan text is proposed.Tibetan proofreading of mutual information of words,and the establishment of a Tibetan dictionary and dictionary,which basically cover common words.Based on these related theoretical studies and data utilization,the Tibetan text automatic proofreading system was finally realized.Automatic text proofreading involves many aspects of research,such as the most basic spell checking,words and words,paragraphs,and semantics.This thesis mainly proposes proofreading based on the word and word levels.The main work of this thesis is as follows:1.Through analysis of the current research trends at home and abroad,there is a lag in the automatic proofreading of Tibetan texts,so as to draw on the automatic proofreading methods of texts in English,Chinese and other minority languages,combined with the characteristics of Tibetan texts,we have studied the automatic matching of Tibetan texts.Proofreading method;2.Researched and studied the basic concepts and applications of mutual information,and applied the calculation method of mutual information to the proofreading of Tibetan characters and words according to the type of errors in Tibetan characters and words;3.Propose a method for automatic proofreading of Tibetan language based on mutual information.Using the algorithmic idea of Tibetan mutual information to proofread the Tibetan text,and in order to obtain a better proofreading effect,a Goode-Turing estimation method for Tibetan proofreading was proposed to smooth the data.Finally,a Tibetan automatic proofreading system based on the mutual information of words was realized,and the overall average accuracy rate,recall rate,and F value reached 81%,78%,and 80%,respectively.4.Proposed a method for automatic proofreading of Tibetan language based on mutual information of words.The idea of the proofreading algorithm is roughly the same as the idea of proofreading the word mutual information.The difference is that the proofreading of the word mutual information requires word segmentation.The word segmentation method is word segmentation through the dictionary matching method,and then calculated according to the limit after the word segmentation.The mutual information table finally realizes the Tibetan automatic proofreading system based on the mutual information of words.The overall average accuracy rate,recall rate and F value reach 69.5%,65% and 67% respectively.5.A Tibetan proofreading system based on mutual information was finally realized.The system includes proofreading functions at two levels,word and word.And it is obtained that the proofreading effect of word mutual information is better than that of word mutual information.6.In order to obtain a widely used proofreading system,the proofreading ideas of language model and semantic analysis are proposed.
Keywords/Search Tags:Mutual Information, Tibetan, Automatic Proofreading, Text
PDF Full Text Request
Related items