Burmese Text Analysis And Implementation For Speech Synthesis

Posted on:2020-08-23

Degree:Master

Type:Thesis

Country:China

Candidate:C E Ma

Full Text:PDF

GTID:2415330575485934

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

In recent years,the application of speech synthesis technology has become increasingly popular.Burmese language belongs to the Tibetan-Burmese branch of the Sino-Tibetan language family which is the official language of Myanmar.Compared with the related studies of Chinese and Tibetan speech synthesis in the same language family,Burmese speech synthesis has not received enough attention.This thesis aims to develop a Burmese speech synthesis system,study and realize corpus construction,text Normalization,word segmentation and text phonetic transcription.The main work of the thesis is as follows:(1)Construction of Burmese corpora.Acquire approximately 600M corpus from the Burmese website,remove the illegal characters and repeated sentences from the text.and unify the character encoding of the text corpus.High frequency words,sentence length,sentence type,the distribution of initials and finals in the text corpus is used as the basis for selecting the pronunciation corpora;the similarity comparison between sentences is another criterion for corpus selection to ensure that speech and linguistic phenomena are included as much as possible;the corpora include 5,000 sentences.(2)Text normalization.The normalization of numbers,abbreviations and special characters has been studied,and specific normalization schemes have been proposed for different types of characters.(3)Implement three methods of word segmentation.The segmentation based on forward maximum matching(FMM)S conditional random field model(CRF)and Bi-directional long short-term memory network with a CRP layer(BiLSTM+CRF)are designed and implemented.The experimental results show that The result of the segmentation speed of the method is:FMM>CRF>BiLSTM+CRF,the result of segmentation accuracy is:BiLSTM+CRF>CRF>FMM.Compared with BiLSTM+CRF,the CRF method has a slightly lower accuracy of segmentation,and the segmentation speed is 62 times.Considering comprehensively,the CRF-based word segmentation method is applied to the development of the Burmese speech synthesis system.(4)Automatic phonetic transcription.According to the MLC(The Myanmar Language Commission)transfer system and the IPA(International Phonetic Alphabet)phonetic system,the automatic phonetic transcription method based on initials and finals is proposed.This method cannot solve the problem of phonetic changes in Burmese text.We categorize the problem of phonetic changes in Burmese text,and choose four modes that cover most of the phonetic changes.Then,a rule-based and a based on CRF model automatic phonetic transcription method are proposed.The experimental results show that the CRF-based text automatic phonetic transcription method is better than the other,and the mean values of the words and syllables labeling correctness is 71.6%and 90.6%,respectively.The mean values of the words and syllables phonetic correctness were 63.6%and 86.8%,respectively.

Keywords/Search Tags:

Burmese Speech synthesis, Corpus construction, Normalization, Word segmentation, Phonetic transcription

PDF Full Text Request

Related items

1	Computer Aided Language Survey And Analysis System
2	Text Analysis Of Speech Synthesis Based On Statistical Parameters Of Tibetan Language In Specific Fields
3	Research On Mandarin And Uyghur Speech Synthesis In Xinjiang Rural Information Pushing System
4	A Study Of Phonetic Transcription Classification On The Annotation Of Classics
5	Design And Implementation Of Indonesian Speech Synthesis System Based On HMM
6	Research And Implementation Of Automatic Labeling System For Quasi Writtern Language Korean Speech Corpus
7	Tibetan Segmentation And POS Tagging Study
8	Research On Mandarin-Xingtai Dialect Cross-lingual Speech Synthesis
9	Psc Word, The Word Computer, Automatic Contrast System Design Studies
10	The Study Of Automatic Chinese Phoneticize Label Based On Automatic Word Segmentation