| With the continuous progress of modern network technology,the era of Internet of things is gradually entering the public life,and one of the most critical technical basis is speech recognition.Only by realizing efficient and reliable speech recognition,can people and objects realize the free interaction without obstacles.At the same time,with the progress of information technology,the massive data deposited also laid a solid foundation for efficient and reliable speech recognition technology.The use of massive data support,combined with the depth of learning technology of artificial intelligence,to achieve efficient and reliable speech recognition has theoretical and practical value.The content of this paper mainly includes three aspects.First,from the basic principle of speech recognition based on a detailed description of the acoustic features extraction,identification system of acoustic model and language model and the key part of the decoder,and according to the existing problems of technical development are pointed out in the technology in acoustic feature extraction,acoustic model and language model modeling.Secondly,the basic principles of deep learning of artificial intelligence are introduced comprehensively from three aspects of model category,model basic composition and key technology.On this basis,according to the acoustic characteristics of speech recognition are extracted,separately from the network processing,including pre structure acoustic characteristics are discussed in detail the hidden layer number and the number of nodes and network parallel training algorithm for several aspects;acoustic modeling problems,in-depth analysis of similarities and differences in the depth of the neural network and the Gauss mixture model in structure and training methods the DNN(Deep Neural Network,DNN)is introduced to describe the feasibility of HMM(Hidden Markov Model,HMM)state output probability distribution.The numerical simulation method is used to verify the problem.Through the HTK speech recognition framework,the extracted 863 Chinese speech database standard speech is compared and verified.Simulation results show that,in terms of word recognition,supervised and unsupervised two new features and original feature extraction methods can improve the accuracy rate of 2.01% and 4.15% respectively.Secondly,through Kaldi open source speech recognition platform,using MATLAB simulation language,the acoustic model of GMM-HMM(Gauss of Mixture Model for Hidden Markov Model,GMM-HMM)and DNN-HMM(Deep Neural Network for Hidden Markov Model,DNN-HMM)is simulated and verified respectively.Simulation results show that the constructed GMM-HMM and DNN-HMM acoustic models can reduce the error rate of speech recognition by 30%. |