Font Size: a A A

Research On Speech Enhancement And Recognition Of Tibetan Amdo Dialec

Posted on:2024-07-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:X J ZhuFull Text:PDF
GTID:1528307067964199Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The Amdo-dialect is one of the three major Tibetan dialects.The research on speech enhancement and speech recognition starts relatively late.The research results are relatively weak.In addition,due to the influence of environmental noise,echo,and reverberation,the recognition rate of the Tibetan speech recognition system,which has very good recognition performance in quiet environments,will be significantly reduced in practical application scenarios.This dissertation takes the Amdo-dialect of Tibetan as the research object and studies the speech enhancement and recognition technology of Amdo-dialect in complex acoustic environments.It aims to improve the quality and intelligibility of noisy Tibetan speech in "low signal-to-noise ratio and non-stationary noise environments" and realize that the recognition rate of the Tibetan Amdo-dialect speech recognition model does not significantly decline in complex acoustic environments.The specific research contents are as follows:(1)Tibetan Speech Enhancement Model Based on Gated Convolution NetworkA speech enhancement model based on a gated convolutional network,NGCRN,is proposed in this dissertation.It can achieve the spectral mapping between noisy speech and clean speech in the complex domain of speech signals.The enhancement performance of the model is improved by introducing gated linear unit modules,multi-objective joint optimization,and other means.It better solves the problem that the speech phase spectrum is difficult to learn by the supervised method.(2)Tibetan Speech Enhancement Model Based on Impoved Wasserstein Generative Adversarial NetworkIn this dissertation,Wasserstein-GAN is improved by some means,such as multi-scale convolution,spectral normalization,and the introduction of mixed penalty terms into the objective function.On this basis,a Tibetan speech enhancement model based on the improved Wasserstein-GAN is proposed,which effectively improves the convergence speed and generates speech quality in the model.(3)Tibetan Speech Enhancement Model based on Multi-dimensional Generative Adversarial NetworkA speech enhancement model based on the multi-dimensional generation of confrontation networks,SEMGAN,is proposed in this dissertation.It uses a multi-output generator to generate enhanced speech signals in different dimensions and uses multiple sub-discriminators to distinguish the real speech signals and generate speech signals in each dimension.Consequently,the model can take into account the information of multiple dimensions in the parameter adjustment process so that the generated speech signals are closer to the real speech signals.(4)A Tibetan Amdo-dialect Speech Recognition Model Based on Knowledge TransferA Tibetan Amdo-dialect speech recognition model,MHLAS,is proposed in this dissertation.It realizes the direct conversion of Tibetan Amdo-dialect speech sequences to corresponding Tibetan character sequences and greatly reduces the difficulty of building a Tibetan Amdo-dialect speech recognition model.Furthermore,in the process of model training,the problem of insufficient training data in Tibetan Amdo-dialect is effectively solved through knowledge transfer technology.The simulation results show that several speech enhancement methods proposed in this dissertation can further improve the quality and intelligibility of noisy Tibetan speech in a low signal-to-noise ratio and non-stationary noise environment on the two noisy datasets constructed by the TIMIT corpus and the Tibetan corpus.The Tibetan Amdo-dialect speech recognition model proposed in this dissertation has achieved relatively ideal recognition performance on the Tibetan corpus.
Keywords/Search Tags:Amdo-Tibetan, Speech Recognition, Speech enhancement, Low SNR, Deep learning
PDF Full Text Request
Related items