Font Size: a A A

Research On Tibetan Multi-task Learning Acoustic Model Based On DNN-HMM

Posted on:2021-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:C W GongFull Text:PDF
GTID:2415330623973164Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Automatic speech recognition technology is one of the key technologies that can make people and machines communicate more smoothly.In recent years,with the continuous development of deep learning technology,the accuracy of speech recognition has been greatly improved.Through research,it is found that sufficient and effective training data can greatly improve the speech recognition effect.For example,languages with sufficient data resources such as English and Chinese Mandarin,the accuracy of speech recognition has reached the level of humans,but the data resources of many languages are relatively limited.This has led to the lack of good progress in speech recognition research for these languages.In this thesis,under the premise of limited acquisition of training data in Tibetan speech recognition,we will study the use of deep neural networks to model acoustic models in Tibetan automatic speech recognition.In the construction of the acoustic model,I tried to use the idea of multi-task learning,through joint training of multiple tasks,so as to improve the accuracy of speech recognition and alleviate the problem of insufficient training data to a certain extent.In the study of multi-task learning of the acoustic model,we chose Tibetan Lhasa as the research object.In the selection of the deep neural network of the acoustic model,we studied and tried the delayed neural network.In order to explore the effect of deep neural networks on Tibetan Lhasa speech recognition,we first established a baseline system for Tibetan Lhasa speech recognition based on TDNN-HMM,and then addressed the problems of model modeling ability and training speed and limited training data.For analysis and research,the semi-orthogonal factorization TDNN structure was used to model the Tibetan Lhasa acoustic model.The experimental results show that the semi-orthogonal factorization TDNN-HMM acoustic model is used in the experimental results compared to the baseline system.There is a 1%relative word error rate drop above.On the basis of the above experiments,we study the multi-task learning of the Tibetan acoustic model.In order to compare the multi-task learning results of this acoustic model,we need to build a better single-task Tibetan Lhasa speech recognition baseline system.Without increasing the data,we try to use the data enhancement method to process the training data,and then use these data to train the model,and build a Tibetan Lhasa baseline system based on semi-orthogonal factorization TDNN-HMM.Through the understanding of Tibetan language,we found that there are many similarities in the pronunciation of Tibetan and Chinese,so we chose to use multilingual speech recognition,a special multi-task learning method,to study the multi-task of the Tibetan acoustic model.By adjusting the model structure and parameters,we obtained the optimal multi-task acoustic model for Tibetan Lhasa,and then compared the single-task learning baseline system with Tibetan Lhasa and multi-task learning with Tibetan Lhasa.The experimental results It shows that compared with the acoustic model of single-task learning,the acoustic model of multi-task learning has a relative reduction of 1% ~ 2% in word error rate.
Keywords/Search Tags:Speech Recognition, Acoustic model, Semi-orthogonal Factorization TDNN-HMM, Multi-task Learning, Tibetan Lhasa
PDF Full Text Request
Related items