Font Size: a A A

Research On Chinese Speech Recognition System Based On Deep Learning

Posted on:2023-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2568306812475524Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the progress of society and the development of science and technology,the technology in the field of artificial intelligence has made breakthroughs again and again.As a branch of the field of artificial intelligence,speech recognition technology has also made gratifying progress.In modern society,speech recognition technology has penetrated into every corner of society and becomes an indispensable technology.The primary purpose of speech recognition is to improve the recognition rate of the system,which has been the main concern of researchers.From the beginning of speech recognition to the present,the traditional acoustic model has always been the mainstream of speech recognition system.Since entering the era of big data,the traditional model is difficult to deal with a large number of complex data.With the development of deep learning,deep neural network has gradually replaced the traditional acoustic model and has a great impact on speech recognition technology.In this thesis,the application of deep neural network in acoustic model is studied on the premise of synthesizing the basic principles of speech recognition and deep learning.The main contents are as follows:Firstly,it introduces the basic principle of speech recognition and the constitution of speech recognition system,as well as the preprocessing of speech data,including speech signal preweighting,speech frame and so on.At the same time,the speech feature extraction methods MFCC and Fbank are given,and the process of feature extraction is deduced.The traditional models of speech recognition are analyzed: acoustic model based on HMM and n-Gram language model.Furthermore,the deep neural network,the deep loop network and its variant long and short memory neural network are analyzed,and the link timing algorithm is described.Then,the acoustic models based on LSTM-CTC and BILSTM-CTC are built,and their performance is verified on the data set.Then,RNN-CTC model,LSTM-CTC model and BILSTM-CTC model were used for comparative tests.Furthermore,Fbank feature is proposed to replace MFCC feature as the input feature of the acoustic model,and the influence of different features on the recognition rate of the system is compared through experiments.In addition,when the gradient is 0 at the later stage of training,which causes the neuron to "die",this thesis adjusts the activation function by replacing the original Re LU activation function with Leaky Re LU activation function.The impact of the two functions on the system is verified through experiments.Finally,through the research and experiments in this thesis,it can be concluded that the acoustic model based on Bi LSTM has better performance than the acoustic model based on LSTM,and the accuracy of both is higher than that of the acoustic model based on RNN.The Fbank feature is more suitable for training the LSTM acoustic model than the MFCC feature,and replacing the Leaky Re LU activation function with the Leaky Re LU activation function could improve the system’s identification performance.
Keywords/Search Tags:Speech recognition, Deep learning, Neural networks, Acoustic models, Acoustic features
PDF Full Text Request
Related items