Font Size: a A A

Research On Tibetan Speech Recognition Based On Deep Convolutional Neural Network

Posted on:2021-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z D HuangFull Text:PDF
GTID:2435330620475887Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Automatic speech recognition has been a core technique in call center,medical service and mobile application,etc.Nowadays,languages with rich corpus,e.g.,English and Chinese,have reached a satisfactory speech recogn ition.However,the Tibetan language speech recognition is compromised due to its lack of rich corpus and lan guage-particularity,the slow development of speech recognition technology.Improving the performance of Tibetan speech recognition system is an important research content in the field of speech recognition technology.This paper mainly studies the application of CNN in Tibetan speech recognition.The main work is as follows:1.Feature extraction.The speech signal is converted into a speech spe ctrum and the information in the speech signal is retained as the characteristic input of the deep convolutional neural network.2.Acoustic modeling.The convolutional neural network with good pe rformance in image recognition is introduced into Tibetan speech recognition to better capture the local information in the speech spectrum.3.End-to-end speech recognition.Combining the convolu tional neural network with the CTC,an end-to-end Tibetan speech recognition system is designed.4.Classifier structure optimization.The number of layers of convolutional neural network is further increased,and the feature extraction ability of the network is improved by using the method of superimposing convolutional layers.A comparative experiment was conducted on the Tibetan corpus esta blished by the above model in the laboratory,and the following conclusions were drawn:1.Transforming speech into spectrum as a feature extraction method can better retain the information in speech signal that is conducive to recognition.2.The use of convolutional neural network to extract speech features from speech spectra improves the performance of Tibetan speec h recognition.3.It is verified that the end-to-end Tibetan speech recognition system is feasible,and the recognition result is better than the recognition model using cross entropy as the loss function.4.Increase the number of layers of the convolutio nal neural network and select the appropriate activation function to further improve the performance of speech recognition.5.After the convolution layer,batch normalization processing and Dropout processing technology are added to "discard" neurons in a fixed proportion in network training to improve recognition performance while redu cing training time.
Keywords/Search Tags:Tibetan, Speech Recognition, CNN, Dropout, Spectrogram
PDF Full Text Request
Related items