Tibetan Multi-task And Multi-dialect Speech Recognition

Posted on:2021-01-06

Degree:Master

Type:Thesis

Country:China

Candidate:J J Le

Full Text:PDF

GTID:2435330602998434

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

GMM-HMM and DNN-HMM have achieved great results in speech recognition,but they are more complicated in model training and system construction.Due to insufficient linguistic studies of low-resource languages such as Tibetan multi-dialects,the application of these techniques is limited.With the development of neural network technology in recent years,the methods of transfer learning and multi-task learning have been widely used in various fields of pattern recognition,End-to-end speech recognition technology has also made important progress in mainstream languages including Chinese and English,but the study on Tibetan multi-dialect multi-task has not been done deeply.Therefore,this paper mainly discusses the application of multi-task learning and transfer learning based on end-to-end technology in Tibetan multi-dialect multi-task speech recognition.1.Multi-task recognition of Tibetan multi-dialect based on WaveNet-CTC modelConnectionist temporal classification discards the complex pre-and post-processing operations in GMM-HMM,which directly models speech recognition as a sequence probability maximization problem,and reduces the model computation complexity through forward-backward algorithm.WaveNet model can effectively increase the receptive field without losing information,so it can make full use of background information.This paper compares the performance of the single-task and multi-task model(two-task and three-task)based on WaveNet-CTC model.Experimental results show that the two-task model has significantly improved speech recognition and dialect or speaker recognition compared to the single-task model.Relative degradation in speech recognition has been found in three-task model.2.Application of WaveNet-CTC Model Integrating Attention Mechanism in Tibetan Multi-Dialect Multi-Task RecognitionAttention mechanism makes full use of context-related information by giving more weight to vectors that are more relevant to the input,which has become an important issue in the field of speech recognition.This paper introduces the attention mechanism on the WaveNet-CTC model,and adds the attention window mechanism to reduce calculation of the model.It also compares the performances caused by the different positions of the attention mechanism.Experimental results show that the model based on high-level attention mechanism further improves the performance on speech recognition,dialect and speaker recognition tasks.3.Tibetan speech recognition based on transfer learningConsidering the features and commonalities between U-Tsang and Amdo dialect,this paper conducts a study on the transfer learning of the Lhasa speech recognition model to the speech recognition of the Amdo pastoral.The experimental results show that transfer learning can effectively utilize the potential similarity between tasks to improve the recognition performance of target tasks.4.Tibetan multi-dialect multi-task recognition systemBased on the Tensorflow framework,a real-time Tibetan multi-dialect multi-task recognition system is established.The system can input Tibetan speech through the microphone,and can automatically calls the trained WaveNet-CTC model to show the recognition results-speech content and dialect identification.

Keywords/Search Tags:

Tibetan multi-dialect speech recognition, multi-task recognition, WaveNet-CTC model, attention mechanism, transfer learning

PDF Full Text Request

Related items

1	Research On Tibetan Multi-task Learning Acoustic Model Based On DNN-HMM
2	Research On Language Recognition Based On Multi-task Neural Network
3	Research On Speech Recognition Of Tibetan Amdo Dialect Based On Deep Learning
4	Research On Uyghur Speech Recognition Based On End-to-End Modeling
5	The Research On Tibetan Speech Recognition Technology
6	Design And Optimization Of Chinese Speech Recognition System In Complex Environment
7	Research On Dance Action Recognition Based On Deep Learning
8	Research On Tibetan Speech Recognition Based On Deep Convolutional Neural Network
9	Speech Recognition Of Hakka Dialect Based On Deep Learning
10	Application Of Tibetan Speech Recognition Based On Active Learning In Online Education