Font Size: a A A

Study On The Construction Of Mongolian Pronunciation Dictionary And Its Application In Speech Recognition

Posted on:2022-01-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:R G W SaFull Text:PDF
GTID:1485306509458334Subject:Chinese Language and Literature
Abstract/Summary:PDF Full Text Request
The Mongolian script is a kind of alphabetic writing.The writing system of traditional Mongolian has not changed substantially since its creation.But the spoken Mongolian itself has undergone great changes,that is,there are differences between written Mongolian and spoken Mongolian.The differences are mainly reflected in the following situations.Those are,the written character combination of long vowels and compound vowels and their phonetic correspondence,most of the short vowels at the end of written words will fall off in spoken Mongolian,the combination of consonants in spoken Mongolian to form compound consonants that causes changes in the number of syllables,the harmony rules of vowels in spoken Mongolian,the change between the spoken pronunciation of an additional component and the final syllable of the word that precedes it etc.Solving these differences is also a difficult problem in Mongolian speech recognition.To develop a pronunciation dictionary for speech recognition is an effective way to solve these differences.First,the pronunciation dictionary of Mongolian language contains parallel pairs of written Mongolian words and their pronunciation annotations,which are used for speech recognition and synthesis.It is a time-consuming and laborious task to rely on a linguist to construct a pronunciation dictionary manually.The main part of this study is how to automatically construct the pronunciation annotation of Mongolian words and develop a pronunciation dictionary.When developing a pronunciation dictionary,we need to solve the problem of G2 P conversion,that is,the corresponding conversion from the sequence of written form of words to the sequence of spoken phonemes of words.In the case of low resources,this paper proposes a method to develop a pronunciation dictionary based on the combination of rules and decision trees.Second,in order to establish the rules of developing a pronunciation dictionary,the corresponding relationship between Mongolian characters and spoken phonemes,the syllable correspondence between written language and spoken words,the harmony of Mongolian spoken vowels,the vocabulary and lexical factors that affect the development of a pronunciation dictionary are further studied.In combination with the vowel harmony rules of spoken Mongolian,focusing on the two adjacent syllables in writing,this paper studies how the vowel and consonant pronunciation in the front syllable affects pronunciation of the vowel characters in the back syllable and gives the rules.In view of the formation of consonant cluster may produce different situation which vowel of the syllable moved forward,fall off,neither loss nor moved forward,extends the original traditional grammar knowledge of pre-consonant and postconsonant binary analysis method,put forward the multivariate data analysis method about first syllable vowel,the first syllable,current syllable and etc.The third,in order to adopt the decision tree algorithm,the feature classification label of Mongolian long and short vowels,the feature vector of Mongolian extended long vowel structure,the feature vector of single vowel character not in the first syllable of the word and the feature vector of the first syllable vowel are proposed.The purpose of the extended classification feature vector of long vowel structure is to deal with the phenomenon of long vowel and compound vowel in traditional grammar knowledge.The purpose of putting forward the feature vector of first syllable vowel is to deal with the pronunciation change of the first syllable vowel character in traditional grammar knowledge.The purpose of the classification feature vector of single vowel character not in the first syllable is to deal with the pronunciation change phenomenon of non first syllable vowel character in traditional grammar knowledge.First,the pronunciation changes of vowel characters in the first syllable,the pronunciation changes of characters corresponding to long vowel and compound vowel,and the pronunciation changes of single vowel characters not in the first syllable are processed locally by decision tree algorithm.Then the new rules based on multivariate data are used to deal with the change of syllable number,the combination of consonants and the harmony of vowels.The fourth,for terms with additional components,the rules are also summarized based on the traditional grammatical knowledge of the changes in the pronunciation of additional components.Specifically,words with additional components are converted in two steps.First,convert the words without the additional components and the pronunciation of the additional components.Then according to the additional component pronunciation change rules,conjunction of the two pronunciation is completed.The fifth,26348 words in the Mongolian-Chinese Dictionary were compared and tested by using the grapheme and phoneme conversion method based on rule and decision tree.At present,by comparing 26348 words in the Mongolian and Chinese Dictionary with the program of writing and pronunciation conversion,the result is that21121 words have been correctly converted,and the correct rate of conversion has reached 80.16%.The sixth,according to the construction requirements of the pronunciation dictionary of the open source speech recognition Kaldi toolkit,the text of 5600 sentences is cut into words,and using the grapheme and phoneme conversion program based on the combination of rules and decision trees obtained in the previous step,10415 words are paralled to the writing and pronunciation annotation pairs.Then,a speech recognition system is built under the environment of open source speech recognition tool Kaldi.The language model is constructed by SRILM toolkit,and the subspace Gaussian mixture model SGMM-HMM and DNN-HMM acoustic model are respectively used for the contrast tests.The experimental results show that the subspace Gaussian mixture SGMM-HMM acoustic model is superior to DNN-HMM acoustic model under the condition of low resource.
Keywords/Search Tags:Mongolian pronunciation dictionary, grapheme and phoneme conversion, decision tree, speech recognition
PDF Full Text Request
Related items