Tibetan Lhasa Acoustic Model Based On LSTM-CTC Speech Recognition System

Posted on:2020-06-02

Degree:Master

Type:Thesis

Country:China

Candidate:S Wang

Full Text:PDF

GTID:2415330572493902

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of science and technology,computers and other intelligent devices are gradually popularized.Speech is the most direct method of communication between people,so Human-Machine speech interaction has been a hot topic for researchers.The performance of ASR has been greatly improved due to the application of DNNs.At present,the speech recognition of large language has achieved good results,but the recognition of small language like Tibetan is also less.However,building a speech recognition system is still a challenging task,which requires various resources,different training stages and professional knowledge.Compared with the traditional speech recognition based on hidden markov model,the end-to-end speech recognition model has a single structure,does not need to distinguish the acoustic model from the language model,and does not need a pronunciation dictionary.At present,there are two main types of end-to-end speech recognition systems: CTC(Connectionist Temporal Classification)and attention model.In this paper,an end-to-end acoustic modeling method based on LSTM-CTC is used for speech recognition of Tibetan Lhasa.In order to eliminate the need to generate frame label in advance,uses the CTC objective function to infer the alignment between speech and label sequences.Using WFSTs for decoding,it can effectively combine dictionary and language model into CTC decoding.In this paper,audio characteristic parameters are used as the input of acoustic model and the output is the probability of phoneme sequence.Finally,the LSTM-CTC based Tibetan speech recognition is realized.The experimental results show that the end-to-end speech recognition results are better than the traditional DNN-HMM method in the existing Tibetan data sets.Compared with the traditional method,it does not need to use GMM-HMM for alignment operation.According to the experimental results,the best Syllable-ER for Tibetan speech recognition based on CTC technology is 18.71%.

Keywords/Search Tags:

Deep Neural Network, end-to-end ASR, Connectionist Temporal Classification(CTC), Tibetan Lhasa

PDF Full Text Request

Related items

1	Research And Implementation Of Sequence To Sequence Tibetan Lhasa Dialect Speech Synthesis
2	Research On Music Audio Classification Based On Deep Learning
3	Text Analysis Of Speech Synthesis Based On Statistical Parameters Of Tibetan Language In Specific Fields
4	Research On Construction And Prediction Of Spanish Pronunciation Dictionary For Military Field
5	Automatic Labanotation Generation Of Continuous Movement Based On Deep Learning
6	Chinese Painting Image Classification Research Based On Deep Learning
7	Music Genre Recognition Research Based On Improved Deep Convolutional Neural Network
8	Research On Chinese Painting Classification Method Based On Deep Learning
9	Research On Continuous Language Translation Of Sign Video Based On Deep Learning
10	An Overview Of Automated Scoring Of Essays Based On Deep Based On Deep Neural Network