Font Size: a A A

Research On Convolutional Network And Its Variants In Keyword Spotting

Posted on:2022-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:C Y MeiFull Text:PDF
GTID:2518306551453964Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Voice wake-up is the entry point for human-machine voice interaction.High accuracy and low false wake-up rate are the basis of a good experience.At the same time,in order to adapt to the computing conditions of mobile devices,the memory and computing resource are also required to be as low as possible.In response to the two requirements of arousal performance and resource occupancy,the focus of field research has shifted from methods based on hidden Markov models to neural network methods that use simple post-processing.Deep neural networks(DNN)and convolutional neural networks(CNN)are widely used based on cross-entropy system trained with framewise alignment data.In view of the large amount of calculation in cross-channel operations of ordinary convolutional networks,Use depthwise separable convolution to separate the convolution operations of cross-channel correlation and spatial correlation.Based on the dilated convolution and depthwise separable convolution structure,this paper proposes an efficient model to improve the performance of the cross-entropy training system.This structure exhibits higher power efficiency and accuracy performance due to the expansion of the receptive field and the separation of depthwise convolution and pointwise convolution.At the same time,inspired by the application of WaveNet in the field of Text To Speech(TTS),a universal and optional keyword wake-up system based on CTC loss is proposed,which performs better than the previous structure based on cross-entropy loss function after feeded domain data.
Keywords/Search Tags:Keyword Spotting, Voice Wake-up, Dilated Convolution, Depthwise Separable Convolution, WaveNet, CTC
PDF Full Text Request
Related items