Font Size: a A A

Research On Speech Recognition Of Tibetan Amdo Dialect Based On Deep Learning

Posted on:2021-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:J W SunFull Text:PDF
GTID:2415330629488955Subject:Engineering
Abstract/Summary:PDF Full Text Request
In the long history of human development,speech,as an essential part of human communication,has always been a key topic for scholars at home and abroad.How to make computer and human communicate through language has been a hot research object.With the emergence of many speech recognition software such as siri and the rise of intelligent home,the application of intelligent speech processing has gradually entered people's lives and continues to play an important role.In this era of big data,deep learning algorithms possess superpowers in data modeling,and have been widely used in the field of pattern recognition such as speech recognition and image processing.At present,the accuracy of speech recognition technology for Multilingual languages such as English,Japanese,German,and Chinese is more than 99%.However,the research on dialects of various languages is still in a shallow stage.Therefore,this thesis focuses on improving the effect of deep learning in continuous speech recognition of Amdo dialect.The main work of this thesis are as follows:Firstly,the thesis completed the data preparation of Tibetan Amdo dialect.We selected 10000 Tibetan common sentences to construct the Tibetan Amdo dialect corpus.We screened five male and five female speakers whose native language is Tibetan Amdo dialect,and recorded 1000 sentences each,with a total corpus time of 15.6 hours.Then the text corpus is labeled according to the pronunciation dictionary,and the collected corpus is respectively composed into a training set and a test set according to a ratio of 3: 1.Secondly,the speech recognition of Tibetan Amdo dialect is realized based on the deep neural networks and hidden Markov model.First,we preprocess the original speech,extract features,and then use the corresponding text to train the language models.Then,a large number of training materials are used to generate the acoustic model.At the same time,text is used to train language model.Finally,the test corpus is input into the model,and the recognized word sequence is decoded and the word error rate is 28.3%.Finally,the speech recognition of Tibetan Amdo dialect is realized based on hybrid end-to-end architecture.An end-to-end Tibetan Amdo dialect speech recognition model based on the connection temporal classification and the attention architecture separately are established.And a method based on hybrid CTC / attention is proposed to optimize the speech recognition of Amdo dialect.By adjusting the weight parameters of CTC,the system accuracy is improved and the model is optimized.The accuracy of the system is improved by adjusting the weight parameters of CTC to optimize the model.When the parameter is 0.2,the hybrid end-to-end model has the lowest error rate,which is 31.5%.
Keywords/Search Tags:Deep Learning, Speech Recognition, Feature Extraction, DNN-HMM, CTC, Attention
PDF Full Text Request
Related items