| With the large-scale popularization of portable terminals such as intelligent devices,the importance of sound as a key means of human-computer interaction is growing.Keyword spotting is very suitable as the opening method of this kind of human-computer interaction,and the spotting of this kind of sound is still a relatively complex architecture to achieve.First,there is a lack of appropriate spotting methods for the keyword spotting of speakers at different distances.Secondly,because the sound keyword spotting algorithm needs to be turned on for a long time,it becomes a problem to realize the algorithm with low complexity and high accuracy.In addition,although it achieves high accuracy in specifying the direction of keyword spotting,it lacks the custom keyword function that allows users to personalize their algorithms.Aiming at the above problems,this thesis studies the sound keyword spotting algorithm.Above all,this thesis investigates the relevant research status at home and abroad,analyzes the current sound keyword detection system and combs the history,summarizes the problems faced by the current system according to the current situation and introduces the structure of the full text.After introducing the basis of the algorithm,the innovation direction of this thesis is put forward.First,to solve the problem of keyword spotting at different distances and improve the robustness of the algorithm,a new feature extraction method is proposed in this thesis,which can maintain or improve the spotting accuracy without increasing or even reducing the amount of computation,and achieve consistent accuracy at different distances.Second,aiming at the problem of high complexity of neural networks,this thesis proposes a design method that combines depth-wise separable convolution with traditional residual networks.Based on ensuring the accuracy,the method reduces the number of parameters and computation,making it feasible to deploy on devices with limited performance.Third,to realize the function of custom keyword detection,this thesis uses the improved ternary loss function to learn the feature expression of sound keywords,and combines the K-nearest neighbor method to realize the classification of this feature expression,to complete the custom keyword detection.Through this optimization scheme,a high accuracy and low complexity algorithm supporting custom keyword detection is realized.At the same time,aiming at the problem of high complexity of neural network,a design method combining deep separable convolution with traditional residual network is proposed,which not only reduces the number of parameters and calculation amount,but also improves the accuracy.In addition,to realize the function of custom keyword spotting,the improved ternary loss function is used to learn the feature expression of sound keywords,and the K-nearest neighbor method is used to detect custom keywords.Through these optimization,a high accuracy and low complexity algorithm supporting custom keyword spotting is realizedThen the deployment of the whole keyword spotting system on PC platform and STM32 platform is elaborated in detail.The Settings and module functions are introduced,and a 12-category sound keyword spotting system including custom keywords is constructed.Finally,based on the baseline system of res15,the proposed algorithm is tested.The DS_res15_KNN scheme based on log-Mel spectrum proposed in this thesis can achieve 97.04% specified keyword spotting accuracy and close to 90% customized keyword accuracy.This scheme has low complexity and high accuracy,which can provide technical support for the deployment of edge offline sound keyword spotting system for intelligent devices. |