Font Size: a A A

Research On Chinese-english Code-Switch Speech Recognition Base On LAS Model

Posted on:2021-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:D MaFull Text:PDF
GTID:2415330623473166Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Automatic speech recognition is an important research area in natural language processing.With the vigorous development of deep neural networks in recent years,the automatic speech recognition community has begun to use deep neural network technology to handle speech recognition tasks.Some researchers directly merge deep neural networks on the HMM-GMM-based model to form HMM-DNNs,while others use the end-to-end ideas in the field of machine translation to directly build end-to-end speech recognition systems.The end-to-end speech recognition system is relatively simple to build,does not require complicated alignment and pronunciation dictionary construction work,and shows good application prospects.In languages with rich data resources,such as Chinese and English,the performance of the end-to-end speech recognition model is close to the performance of the HMM-DNN model.But for the low-resource speech recognition task of Chinese-English Code-Switch,the end-to-end speech recognition system has not achieved good performance.This thesis studies the modeling method of end-to-end speech recognition in the case of limited Chinese and English mixed data.It mainly focuses on two end-to-end speech recognition models: Connectionist Temporal Classification(CTC)and encoder-decoder network based on attention mechanism(LAS).Research on improving the performance of encoders of CTC model and the performance of encoders based on the attention mechanism,and trying to combine the two models to improve the performance of Chinese-English Code-Switch speech recognition.On the low-resource Chinese-English Code-Switch data,the advantages of different models are used to improve the performance of the end-to-end model,thereby improving the accuracy of speech recognition.In this study,the Chinese-English Code-Switch in Singapore and Malaysia was taken as the research object.First,the state of the art Chinese-English Code-Switch speech recognition system based on HMM-DNN was established as a comparison system.The second is to train a character-level recurrent neural network language model to assist the decoding process of the end-to-end model.In terms of the end-to-end model,this thesis selects the Connectionist Temporal Classification network and the encoder-decoder network based on the attention mechanism for research.First,the pre-input network of the convolutional neural network is added to the two models to improve the model performance and reduce the GPU memory usage.Second,in the hybrid structure that Connectionist Temporal Classification network to assist the encoder-decoder network based on the attention mechanism,through the adjustment of encoder structure,the selection of attention mechanism type,the selection of schedule sampling parameters and the adjustment of training hyperparameters,etc.Finally,the two test sets of the Chinese-English code-switch data set SEAME obtained 24.4% WER and 17.6% WER.This result is basically the same as the test result of the traditional HMM-DNN-based system.
Keywords/Search Tags:End-to-End, Hybrid Structure, Chinese-English Code-Switch, Speech Recognition, LAS
PDF Full Text Request
Related items