Font Size: a A A

Study On Speech Keyword Spotting Methods Based On Deep Learning

Posted on:2023-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:X Y DengFull Text:PDF
GTID:2568306827467484Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Speech keyword spotting is the use of speech signal processing methods to detect a number of predefined keywords from the user’s speech stream.At present,important progress has been made in the research of speech keyword recognition,and some related software and hardware products have emerged and are widely used in human-computer interaction,mobile phone voice assistants,smart speakers,smart headphones,smart homes and other fields.Especially in recent years,with the rise of deep learning theory,new progress has been made in speech keyword spotting technology based on deep neural networks,but it is difficult to be applied on terminal devices due to its large number of parameters and high computing power requirements.In this thesis,considering the application requirements of a low number of parameters and low computational complexity,we apply the deep learning theory and study the speech keyword recognition technology.The main work of this thesis is as follows.(1)For the low number of parameters and low computational complexity application requirements,this thesis proposes a speech keyword recognition model based on the Ghost module and coordinate attention mechanism.The model firstly uses the Ghost module instead of traditional convolution to effectively reduce the number of parameters and computational effort by adjusting the compression ratio of the module;secondly,it uses a coordinate attention mechanism instead of a self-attention-based Transformer network for global information fusion to improve the network performance;finally,it encapsulates Ghost,normal convolution,coordinate attention and other modules in the form of residual connection to prevent network degradation during training.Finally,Ghost,convolution,and coordinate attention modules are encapsulated in the form of residual connections to prevent network degradation during training;simulation experiments are conducted on the Google dataset and Chinese dataset,and the experimental results show that the model effectively reduces the number of parameters and achieves 94.53% recognition rate with 26 K parameters on Google dataset.(2)To address the phenomenon of poor accuracy and overfitting of the network during training using the Ghost module and coordinate attention mechanism.A speech keyword recognition model incorporating Mobile Vi T and spatial pyramid pooling modules is proposed.In this model,firstly,the Mobile Net module’s deep separable convolution technique is used instead of the Ghost module.The Mobile Net structure uses upscaling to boost the number of feature channels first,and then downscaling to improve network performance;secondly,the Mobile Vi T module is used to combine local and global information through the stacking of Transformer network and convolutional network based on the self-attention mechanism to enhance the network fitting ability;then,the experiments are performed by.Finally,to address the network overfitting problem,the spatial pyramid pooling technique is used to fuse the features after convolution to improve the network performance.Simulation experiments are conducted on the Google dataset and Chinese dataset,and the experimental results show that the model has a small number of parameters and high recognition accuracy,and the recognition accuracy reaches 96.87% on the Google dataset.In this thesis,two speech keyword spotting methods are proposed by applying deep learning theory.These two methods have simple network structures and small number of parameters,and achieve good recognition results on both the Google dataset and the Chinese dataset.In addition,these two methods only need to fine-tune the models to complete the recognition tasks on different datasets,and the networks have good generalisation ability.
Keywords/Search Tags:Keyword spotting, Speech Recognition, Attention Mechanism, Ghost Module, Transformer Network
PDF Full Text Request
Related items