Font Size: a A A

Indonesian Speech Recognition Based On Deep Neural Network

Posted on:2023-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:R L YangFull Text:PDF
GTID:2555306617476454Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Speech recognition systems based on deep neural networks and trained with large-scale datasets have shown good performance.Indonesian is a language spoken by hundreds of millions of people,and research on Indonesian speech recognition is relatively lagging behind due to the late start of research and the lack of large-scale speech data and other problems.The text explores methods to improve the performance of speech recognition models by using deep neural network models and combining multi-task learning and transfer learning for Indonesian speech features and its low-resource characteristics.The main work of the paper includes:(1)Complete the regularization of Indonesian text;select phonemes as modeling units according to the phonological characteristics of Indonesian and combined with the suggestions of language experts;design a suitable Indonesian pronunciation dictionary to realize the mapping between words to phonemes.(2)The acoustic modeling method of Indonesian speech recognition based on deep neural network is studied,and a DNN-HMM-based acoustic model of Indonesian language is designed and implemented.Considering the high degree of freedom of DNN model,Time-Delay Neural Network(TDNN)is selected instead of DNN to improve the performance of Indonesian speech recognition acoustic model.Based on this,for the problem that TDNN adds constraints to the model for poor recognition results of long-time speech,we propose to combine the time-constrained multi-headed attention mechanism with TDNN model to improve the performance of the acoustic model,and verify the effectiveness of the combination approach with experiments.(3)To address the problem of small training data size under low-resource conditions,a multitask learning method is used to construct a multilingual multitask acoustic model by joint training of two tasks with Indonesian as the primary task and English as the secondary task.According to the specificity of the structure of the multitask learning model,the differences between tasks need to be considered while joint training.To address this issue,two more modeling approaches are designed in this paper: the traditional multitask learning model,and the multitask learning model with attribute dependencies,and the comparison experiments of the two models are completed.In addition,the bottleneck features are extracted from the constructed multilingual multitask model,and the features are fused with FBank+Pitch features to improve the performance of the model.(4)To further improve the performance of Indonesian speech recognition under low-resource conditions,this paper optimizes the speech recognition system using transfer learning,with English as the source language and Indonesian as the target language according to the similarity between languages,and applies three transfer methods,namely,fixed hidden layer transfer,Fine-Tuning and hierarchical weight transfer,to the acoustic model based on deep neural networks,respectively,and verifies the effectiveness of each method with experiments.The experimental results show that the WER(word error rate)of DNN-HMM,TDNN-HMM and TDNN-Attention-HMM models are 9.24%,7.88% and 7.77%,respectively;the WER of the multi-task learning model containing attribute dependency in the multi-task learning approach is 7.69%,and the bottleneck features extracted on this model with Fbank+ pitch features for feature fusion and then trained the model with WER of 7.59%;the hierarchical weight transfer model based on TDNN-Attention-HMM with WER of 6.79%.The experimental results verify that the three methods proposed and implemented in this paper to improve the performance of Indonesian speech recognition models under low-resource conditions are feasible and effective.
Keywords/Search Tags:Indonesian, Speech recognition, Deep Neural Network, Multi-tasking Learning, Transfer Learning
PDF Full Text Request
Related items