Font Size: a A A

Tibetan Multi-task And Multi-dialect Speech Recognition

Posted on:2021-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:J J LeFull Text:PDF
GTID:2435330602998434Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
GMM-HMM and DNN-HMM have achieved great results in speech recognition,but they are more complicated in model training and system construction.Due to insufficient linguistic studies of low-resource languages such as Tibetan multi-dialects,the application of these techniques is limited.With the development of neural network technology in recent years,the methods of transfer learning and multi-task learning have been widely used in various fields of pattern recognition,End-to-end speech recognition technology has also made important progress in mainstream languages including Chinese and English,but the study on Tibetan multi-dialect multi-task has not been done deeply.Therefore,this paper mainly discusses the application of multi-task learning and transfer learning based on end-to-end technology in Tibetan multi-dialect multi-task speech recognition.1.Multi-task recognition of Tibetan multi-dialect based on WaveNet-CTC modelConnectionist temporal classification discards the complex pre-and post-processing operations in GMM-HMM,which directly models speech recognition as a sequence probability maximization problem,and reduces the model computation complexity through forward-backward algorithm.WaveNet model can effectively increase the receptive field without losing information,so it can make full use of background information.This paper compares the performance of the single-task and multi-task model(two-task and three-task)based on WaveNet-CTC model.Experimental results show that the two-task model has significantly improved speech recognition and dialect or speaker recognition compared to the single-task model.Relative degradation in speech recognition has been found in three-task model.2.Application of WaveNet-CTC Model Integrating Attention Mechanism in Tibetan Multi-Dialect Multi-Task RecognitionAttention mechanism makes full use of context-related information by giving more weight to vectors that are more relevant to the input,which has become an important issue in the field of speech recognition.This paper introduces the attention mechanism on the WaveNet-CTC model,and adds the attention window mechanism to reduce calculation of the model.It also compares the performances caused by the different positions of the attention mechanism.Experimental results show that the model based on high-level attention mechanism further improves the performance on speech recognition,dialect and speaker recognition tasks.3.Tibetan speech recognition based on transfer learningConsidering the features and commonalities between U-Tsang and Amdo dialect,this paper conducts a study on the transfer learning of the Lhasa speech recognition model to the speech recognition of the Amdo pastoral.The experimental results show that transfer learning can effectively utilize the potential similarity between tasks to improve the recognition performance of target tasks.4.Tibetan multi-dialect multi-task recognition systemBased on the Tensorflow framework,a real-time Tibetan multi-dialect multi-task recognition system is established.The system can input Tibetan speech through the microphone,and can automatically calls the trained WaveNet-CTC model to show the recognition results-speech content and dialect identification.
Keywords/Search Tags:Tibetan multi-dialect speech recognition, multi-task recognition, WaveNet-CTC model, attention mechanism, transfer learning
PDF Full Text Request
Related items