Research On Tibetan Multi-task Learning Acoustic Model Based On DNN-HMM

Posted on:2021-02-22

Degree:Master

Type:Thesis

Country:China

Candidate:C W Gong

Full Text:PDF

GTID:2415330623973164

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Automatic speech recognition technology is one of the key technologies that can make people and machines communicate more smoothly.In recent years,with the continuous development of deep learning technology,the accuracy of speech recognition has been greatly improved.Through research,it is found that sufficient and effective training data can greatly improve the speech recognition effect.For example,languages with sufficient data resources such as English and Chinese Mandarin,the accuracy of speech recognition has reached the level of humans,but the data resources of many languages are relatively limited.This has led to the lack of good progress in speech recognition research for these languages.In this thesis,under the premise of limited acquisition of training data in Tibetan speech recognition,we will study the use of deep neural networks to model acoustic models in Tibetan automatic speech recognition.In the construction of the acoustic model,I tried to use the idea of multi-task learning,through joint training of multiple tasks,so as to improve the accuracy of speech recognition and alleviate the problem of insufficient training data to a certain extent.In the study of multi-task learning of the acoustic model,we chose Tibetan Lhasa as the research object.In the selection of the deep neural network of the acoustic model,we studied and tried the delayed neural network.In order to explore the effect of deep neural networks on Tibetan Lhasa speech recognition,we first established a baseline system for Tibetan Lhasa speech recognition based on TDNN-HMM,and then addressed the problems of model modeling ability and training speed and limited training data.For analysis and research,the semi-orthogonal factorization TDNN structure was used to model the Tibetan Lhasa acoustic model.The experimental results show that the semi-orthogonal factorization TDNN-HMM acoustic model is used in the experimental results compared to the baseline system.There is a 1%relative word error rate drop above.On the basis of the above experiments,we study the multi-task learning of the Tibetan acoustic model.In order to compare the multi-task learning results of this acoustic model,we need to build a better single-task Tibetan Lhasa speech recognition baseline system.Without increasing the data,we try to use the data enhancement method to process the training data,and then use these data to train the model,and build a Tibetan Lhasa baseline system based on semi-orthogonal factorization TDNN-HMM.Through the understanding of Tibetan language,we found that there are many similarities in the pronunciation of Tibetan and Chinese,so we chose to use multilingual speech recognition,a special multi-task learning method,to study the multi-task of the Tibetan acoustic model.By adjusting the model structure and parameters,we obtained the optimal multi-task acoustic model for Tibetan Lhasa,and then compared the single-task learning baseline system with Tibetan Lhasa and multi-task learning with Tibetan Lhasa.The experimental results It shows that compared with the acoustic model of single-task learning,the acoustic model of multi-task learning has a relative reduction of 1% ~ 2% in word error rate.

Keywords/Search Tags:

Speech Recognition, Acoustic model, Semi-orthogonal Factorization TDNN-HMM, Multi-task Learning, Tibetan Lhasa

PDF Full Text Request

Related items

1	Tibetan Lhasa Acoustic Model Based On LSTM-CTC Speech Recognition System
2	Tibetan Multi-task And Multi-dialect Speech Recognition
3	Research On Tibetan Speech Recognition Technology
4	Low-resource Tibetan Multi-dialect Speech Recognitio
5	Research On Uyghur Speech Recognition Based On Deep Learning And Data Augmentation
6	Research On Speech Synthesis Technology For Tibetan Lhasa Based On Fully End-to-End Method
7	Research On Tibetan Speech Emotion Recognition Method Based On Multi-feature Fusio
8	An Empirical Study Of Phonetic Transfer In English Monophthong Learning By Tibetan (Lhasa) Speakers
9	The Research On Tibetan Speech Recognition Technology
10	Research And Implementation Of Sequence To Sequence Tibetan Lhasa Dialect Speech Synthesis