Font Size: a A A

Research On Information Processing Oriented Tibetan Homograph Pronunciation Recognition Technology

Posted on:2019-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:B D Z LaFull Text:PDF
GTID:2335330566966313Subject:Chinese Ethnic Language and Literature
Abstract/Summary:PDF Full Text Request
Modern Tibetan words can be divided into two types from the pronunciation of words.One is a different spelling,but the pronunciation is the same,and the other is the same word,but the pronunciation is different.From the meaning of modern Tibetan words,they can be divided into homographs and heteromorphic synonym.Among them,there are certain differences between the words with the same shape but different pronunciation and polyphones in Tibetan linguistics,as well as there are certain differences between homograph and polysemous words.However,these vocabularies have certain similarities in morphology.In the process of Tibetan language development,many words have changed greatly from meaning and pronunciation.This has brought great difficulties to the development and research of Tibetan computational linguistics.With the continuous development and gradual maturity of Tibetan information processing,the study of Tibetan speech synthesis have also begun to enter the peak of in-depth research,thus the problem of the Tibetan with the same shape but different pronunciation is more prominent,however,experts and scholars in this field have very little research in this area.At the same time,the words with the same shape but different pronunciation in Tibetan are different from polyphone in Chinese.Only relying on dictionaries cannot solve the ambiguity of Tibetan words.Therefore,this article starts from the unique language rules and phonetic characteristics of the Tibetan language.First of all,according to the words with the same shape but different pronunciation in Tibetan commonly used in the Tibetan-Chinese Dictionary,a total of 465 keywords were collected and organized,secondly,according to the frequency of actual occurrence of these keywords in more than 280,000 Tibetan texts and the frequency of use of different pronunciations,180 high-frequency keywords were selected as the main object of the analysis in this study.There are many words in the Tibetan language that have two different pronunciations,some of these words are slightly different in semantics,and some are completely different.However,it is easy to misread or misunderstand these words.Based on the analysis and study of Tibetan's different pronunciations of words,the causes of the words with the same shape but different pronunciation is studied from the aspect of pronunciation.First of all,it deeply analyzes the word structure,classification and the form that appears in the text of the Tibetan with the same shape but different pronunciation,the fundamental source of the words with the same shape but different pronunciation in Tibetan TTS system is systematically studied,it also discusses the semantics and usage of different pronunciations.Based on the analysis results,this paper finally adopts a combination of rules and statistics,in an environment where experimental conditions are relatively perfect,experiments on 180 of the words with the same shape but different pronunciation are carried out.The experimental results show that the application of the combination of rules and statistical methods enables the system to learn from others' strong points and close the gap.Thus,it is possible to efficiently solve the difficulties in recognizing pronunciation of the words with the same shape but different pronunciation in the current Tibetan speech synthesis,and achieved a good recognition effect.At the same time,according to the initial experimental results,285 remaining dissimilar words with the same shape but different pronunciation in Tibetan are also analyzed and tested.The results show that the method is universal in the pronunciation recognition of the Tibetan with the same shape but different pronunciation.It provides a strong basis for the front-end text analysis module of the speech synthesis system,and also has certain reference significance for other aspects of Tibetan linguistic theories.This article has mainly solved the outstanding problems in the Tibetan speech synthesis,at the same time,the words with the same shape but different pronunciation in the front end text analysis of Tibetan TTS system are analyzed and studied,the Japanese text analyzer Mecab system based on the CRF model was used in Tibetan text analysis.Because the module design inside the Mecab system is clear and simple,Tibetan and Japanese have similar segmentation and annotation system.Therefore,the Mecab system has a good reference value for Tibetan text processing.
Keywords/Search Tags:homograph, pronunciation disambiguation, Tibetan speech synthesis, text analysis, Mecab
PDF Full Text Request
Related items