Font Size: a A A

Automatic Speech Recognition And Hotword Enhancement Algorithm Based On Transformer

Posted on:2024-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y QianFull Text:PDF
GTID:2568307103973789Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Speech recognition is an important component of the field of artificial intelligence.Nowadays,end-to-end models have been widely applied in the field of speech recognition because of the efficient training and simple architecture,such as Transformer.However,there are still some problems with the in current researchs.The performance of Transformer models established in general scenarios will sharply decline when dateset have a large number of proprietary terms,such as medical terms and finance terms.There is a lack of Transformer automatic speech recognition models designed for specific domains.Therefore,this article takes the medical field as the entry point and establishes a medical terminology dataset with 6 major departments and 225 small departments.Based on this dataset,the following research work has been carried out from both offline and online scenarios:(1)A hotword enhancement algorithm based on soft beam pruning and prefix word module is proposed for offline scenarios.Firstly,this article constructs a Transformer model based on position convolution module and label smoothing algorithm to improve the performance of the model in general scenarios.Secondly,the model strengthens the connection between terms and prefix wordcollocations by adding prefix word modules,and generates hotword paths through soft beam pruning and backtracking,weighting the hotword paths to improve the accuracy of the model’s hotword recognition.Compared with the baseline Transformer model,the improved Transformer model achieved a 0.5%and 0.3% decrease in word error rates on the Chinese datasets Aishell1 and Aishell2,and a 2% decrease in word error rates on the recorded medical conversation test set with the addition of hotword enhancement algorithm.(2)A rescoring model based on CTC prefix decoding and bundle search is proposed for online scenarios,which is combined with hotword enhancement algorithms to improve the performance of the streaming speech recognition model through secondary scoring and error correction.In response to the problem of slow decoding speed in streaming speech recognition models,dynamic threshold decoding acceleration is added in the decoding stage,dynamically pruning some low confidence paths to improve the decoding speed of hotword enhancement algorithms.At the same time,preserve the path of enhancing the state of hotwords to avoid excessive pruning affecting the recognition effect of hotwords.The experimental results on the medical dialogue test set showed that the word error rate of the model trained on the Aishell1 and Aishell2 datasets decreased by 3.12% and 2.35%,respectively,and the decoding speed increased by 26%.(3)Develop a Transformer based speech recognition hotword enhancement system,relying on Py Qt’s GUI tool,and apply the proposed speech recognition based hotword enhancement algorithm to the system.The system has some functions such as offline speech recognition,streaming speech recognition,medical hotword enhancement,custom hotword upload,and log record export.Using the built GUI program as the front-end and the trained speech recognition hotword enhancement model as the back-end,audio data is collected,recognition results are output,and the performance of hotword enhancement is dynamically adjusted by selecting the strength of hotwords.
Keywords/Search Tags:Automatic Speech Recognition, Hotword Enhancement, Transformer, Soft Beam Pruning, Resoring Model
PDF Full Text Request
Related items