Font Size: a A A

End-to-End Chinese Speech Recognition Algorithm Research

Posted on:2023-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:Q LinFull Text:PDF
GTID:2558307061962189Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the development of science and technology,speech recognition has become one of the important technologies of human-computer interaction,and people’s requirements for speech recognition are also increasing.With the continuous improvement and in-depth research of deep learning technology,speech recognition technology has also made a qualitative leap,especially end-to-end speech recognition technology.This paper studies and analyzes end-to-end speech recognition technology on the basis of researching the current status of speech recognition technology,and analyzes and experiments on a variety of mainstream speech recognition models.The improvement and optimization of the LAS model,and the experimental comparison between the improved model and the original model.In summary,the main work of this paper includes:(1)The development process of speech recognition is analyzed from its birth to the present and the technical update iterations involved in speech recognition at each development stage,and then the current application scenarios,technical progress,and existing problems and difficulties of end-to-end speech recognition are discussed.(2)The technical process of speech recognition is researched and analyzed,mainly including speech signal preprocessing technology,speech feature extraction technology,acoustic model,language model technology,etc.Applicable scene.(3)The two most commonly used algorithm mechanisms in end-to-end speech recognition technology are analyzed: the connection timing classification(CTC)and the attention mechanism(Attention)algorithm,and the end-to-end speech recognition technology is applied to build four current mainstream The advantages and disadvantages of each end-to-end speech recognition model are analyzed and compared through experimental research.In addition,in order to better explore the performance of the GRU and LSTM recurrent neural network variants on the experimental dataset,this paper conducts experiments on the two RNN variants respectively.The experimental results show that when the amount of training data is moderate,the GRU neural unit can reduce the amount of modeling parameters and reduce the complexity of the model to improve the training effect,and the recognition rate is similar to or even slightly higher than that of LSTM.(4)An improved speech recognition model based on the LAS model is implemented,which solves many problems of the traditional LAS model.First,for the problem that the LAS encoder has a large amount of parameters and cannot be parallelized,this paper adopts a Conformer encoder model with fewer parameters and allows parallel computing to optimize the p BLSTM structure that replaces the LAS encoder.Second,for the Attention network of LAS,this paper adopts a location-based Attention network with the previous information features,which makes the attention mechanism more sensitive to the location of the information.(5)In order to speed up model training and recognition efficiency,improvements to the LAS decoder model are proposed.First,the decoder network in this paper adopts the Bi GRU network to optimize the original 2-layer LSTM network,which reduces the amount of parameters and optimizes the contextual timing feedback capability of the decoder network.Second,the joint decoding method of CTC and LAS is used to speed up model training and improve the recognition accuracy.
Keywords/Search Tags:End-to-End Automatic Speech Recoginition, LAS, CTC, Attention, Conformer
PDF Full Text Request
Related items