| The research content of this paper is speech recognition technology for Tibetan.We have established speech recognition system models for Amdo dialect and U-Tsang dialect of Tibetan.For Amdo dialect,we mainly study from five aspects.First of all,based on a existing lexicon of U-Tsang dialect,we established a new lexicon of Amdo dialect.The new lexicon combined the pronunciation characteristics of Amdo dialect,and didn’t use a tone information tagging.We also expanded the content.Secondly,we propose to extend the audio data of Amdo dialect using the data augmentation technique of speed disturbance and adding noise and reverberation.The performance of the acoustic model is improved by using the augmented data to train the network parameters.Based on the HMM-DNN architecture,we uesd a 16-layer time delay neural network to establish the acoustic model of Amdo dialect.This model obtained a 13.62%character error rate(CER)results on the test set.We further explored the performance of the TDNN-F model used to build acoustic model of the Amdo dialect.The experimental results show that the TDNN-F acoustic model can greatly reduce the parameters of the neural network of the Tibetan acoustic model under the same performance as the TDNN model.Fourthly,we used a single Tibetan character as the basic unit and choosed the 5-gram Tibetan language model.By dividing Tibetan text data into oral domain and news domain,two language models in different fields are trained.We then build new language models by combining the oral and news models with a ratio of 1:1 and 8:2.The experiment shows that under the condition of low resource in Tibetan language,the reasonable use of language model combining technology can make different domain language models achieve complementary advantages.Finally,we also carried out the Tibetan language model rescoring experiment.We first decoded on the test set using the pruned general language model with 1:1 proportional interpolation.Then we make rescoring using a larger language model with a 8:2 ratio merging bias.This method improves the recognition effect of the model in oral test scenarios.The experiment shows that the Tibetan speech recognition system can be easily migrated in different application scenarios by rescoring technology.For U-Tsang dialect,we used the original U-Tsang dialect lexicon to establish an acoustic model.Because of the lack of acoustic data in U-Tsang dialect,we use the idea of transfer trainning.We borrowed the 16-layer Amdo TDNN acoustic model trained by the acoustic data of Ando dialect.Then we replaced the output layer and added two new hidden layers to establish the initial acoustic model.Then we use the U-Tsang acoustic data which has been augmented to do adaptive training.Finally,the acoustic model of U-Tsang dialect is obtained.Combined with the previously trained language model,we build the speech recognition system model of U-Tsang dialect,and obtained the recognition effect of 18.97% character error rate on the test set.Compared with the TDNN acoustic model trained with only a small amount of acoustic data,the preformance is improved by 33.6%. |