Research On Information Processing Oriented Tibetan Homograph Pronunciation Recognition Technology

Posted on:2019-02-25

Degree:Master

Type:Thesis

Country:China

Candidate:B D Z La

Full Text:PDF

GTID:2335330566966313

Subject:Chinese Ethnic Language and Literature

Abstract/Summary:

PDF Full Text Request

Modern Tibetan words can be divided into two types from the pronunciation of words.One is a different spelling,but the pronunciation is the same,and the other is the same word,but the pronunciation is different.From the meaning of modern Tibetan words,they can be divided into homographs and heteromorphic synonym.Among them,there are certain differences between the words with the same shape but different pronunciation and polyphones in Tibetan linguistics,as well as there are certain differences between homograph and polysemous words.However,these vocabularies have certain similarities in morphology.In the process of Tibetan language development,many words have changed greatly from meaning and pronunciation.This has brought great difficulties to the development and research of Tibetan computational linguistics.With the continuous development and gradual maturity of Tibetan information processing,the study of Tibetan speech synthesis have also begun to enter the peak of in-depth research,thus the problem of the Tibetan with the same shape but different pronunciation is more prominent,however,experts and scholars in this field have very little research in this area.At the same time,the words with the same shape but different pronunciation in Tibetan are different from polyphone in Chinese.Only relying on dictionaries cannot solve the ambiguity of Tibetan words.Therefore,this article starts from the unique language rules and phonetic characteristics of the Tibetan language.First of all,according to the words with the same shape but different pronunciation in Tibetan commonly used in the Tibetan-Chinese Dictionary,a total of 465 keywords were collected and organized,secondly,according to the frequency of actual occurrence of these keywords in more than 280,000 Tibetan texts and the frequency of use of different pronunciations,180 high-frequency keywords were selected as the main object of the analysis in this study.There are many words in the Tibetan language that have two different pronunciations,some of these words are slightly different in semantics,and some are completely different.However,it is easy to misread or misunderstand these words.Based on the analysis and study of Tibetan’s different pronunciations of words,the causes of the words with the same shape but different pronunciation is studied from the aspect of pronunciation.First of all,it deeply analyzes the word structure,classification and the form that appears in the text of the Tibetan with the same shape but different pronunciation,the fundamental source of the words with the same shape but different pronunciation in Tibetan TTS system is systematically studied,it also discusses the semantics and usage of different pronunciations.Based on the analysis results,this paper finally adopts a combination of rules and statistics,in an environment where experimental conditions are relatively perfect,experiments on 180 of the words with the same shape but different pronunciation are carried out.The experimental results show that the application of the combination of rules and statistical methods enables the system to learn from others’ strong points and close the gap.Thus,it is possible to efficiently solve the difficulties in recognizing pronunciation of the words with the same shape but different pronunciation in the current Tibetan speech synthesis,and achieved a good recognition effect.At the same time,according to the initial experimental results,285 remaining dissimilar words with the same shape but different pronunciation in Tibetan are also analyzed and tested.The results show that the method is universal in the pronunciation recognition of the Tibetan with the same shape but different pronunciation.It provides a strong basis for the front-end text analysis module of the speech synthesis system,and also has certain reference significance for other aspects of Tibetan linguistic theories.This article has mainly solved the outstanding problems in the Tibetan speech synthesis,at the same time,the words with the same shape but different pronunciation in the front end text analysis of Tibetan TTS system are analyzed and studied,the Japanese text analyzer Mecab system based on the CRF model was used in Tibetan text analysis.Because the module design inside the Mecab system is clear and simple,Tibetan and Japanese have similar segmentation and annotation system.Therefore,the Mecab system has a good reference value for Tibetan text processing.

Keywords/Search Tags:

homograph, pronunciation disambiguation, Tibetan speech synthesis, text analysis, Mecab

PDF Full Text Request

Related items

1	Text Analysis Of Speech Synthesis Based On Statistical Parameters Of Tibetan Language In Specific Fields
2	Research On Mandarin-Xingtai Dialect Cross-lingual Speech Synthesis
3	Research On Neural Network Based Tibetan Speech Synthesis Technique
4	Research On Speech Synthesis Technology For Tibetan Lhasa Based On Fully End-to-End Method
5	End-To-End Tibetan Speech Synthesis Technology Based On Deep Learning
6	Burmese Text Analysis And Implementation For Speech Synthesis
7	End-to-End Speech Synthesis Method For Tibetan Amdo Dialect
8	Research And Implementation Of Sequence To Sequence Tibetan Lhasa Dialect Speech Synthesis
9	Research On The Speech Synthesis Technology Of Tibetan Dialect
10	Research On Mandarin And Uyghur Speech Synthesis In Xinjiang Rural Information Pushing System