Font Size: a A A

Tibetan Lhasa Acoustic Model Based On LSTM-CTC Speech Recognition System

Posted on:2020-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2415330572493902Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of science and technology,computers and other intelligent devices are gradually popularized.Speech is the most direct method of communication between people,so Human-Machine speech interaction has been a hot topic for researchers.The performance of ASR has been greatly improved due to the application of DNNs.At present,the speech recognition of large language has achieved good results,but the recognition of small language like Tibetan is also less.However,building a speech recognition system is still a challenging task,which requires various resources,different training stages and professional knowledge.Compared with the traditional speech recognition based on hidden markov model,the end-to-end speech recognition model has a single structure,does not need to distinguish the acoustic model from the language model,and does not need a pronunciation dictionary.At present,there are two main types of end-to-end speech recognition systems: CTC(Connectionist Temporal Classification)and attention model.In this paper,an end-to-end acoustic modeling method based on LSTM-CTC is used for speech recognition of Tibetan Lhasa.In order to eliminate the need to generate frame label in advance,uses the CTC objective function to infer the alignment between speech and label sequences.Using WFSTs for decoding,it can effectively combine dictionary and language model into CTC decoding.In this paper,audio characteristic parameters are used as the input of acoustic model and the output is the probability of phoneme sequence.Finally,the LSTM-CTC based Tibetan speech recognition is realized.The experimental results show that the end-to-end speech recognition results are better than the traditional DNN-HMM method in the existing Tibetan data sets.Compared with the traditional method,it does not need to use GMM-HMM for alignment operation.According to the experimental results,the best Syllable-ER for Tibetan speech recognition based on CTC technology is 18.71%.
Keywords/Search Tags:Deep Neural Network, end-to-end ASR, Connectionist Temporal Classification(CTC), Tibetan Lhasa
PDF Full Text Request
Related items