Font Size: a A A

Design And Implementation Of Speaker Recognition System Based On Neural Network Architecture Search

Posted on:2021-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:W Q WuFull Text:PDF
GTID:2518306104495714Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Speaker recognition has gradually become an important means of identity verification based on its security and convenience.It is widely used in public security,e-commerce,internet,finance,and other industries.Techniques involved in speaker recognition include areas such as acoustics,signal processing,and machine learning.The traditional speaker recognition technology is mainly based on the statistical learning method.The Gaussian mixture model is used to classify the speaker frequency characteristics of the speaker frequency.Recently,the most advanced speech recognition technology combined with deep learning technology has greatly improved the recognition accuracy.However,deep learning techniques rely heavily on networks designed by deep learning experts or engineers.Because of the above situation,this paper studies and designs a speaker recognition technology based on neural architecture search.Neural architecture search technology has a huge search space and can search for a neural network architecture that is more advanced than artificial design.But in general,neural architecture search technology consumes a lot of resources.To improve the search efficiency,this paper first proposes a novel search strategy based on an evolutionary algorithm,namely hierarchical evolutionary search strategy.Then,this paper uses the heterogeneous perceptual scheduling algorithm to assign excellent candidate structures to powerful GPU nodes.Finally,a sparse software acceleration strategy is designed for large-scale super network training and sub-network retraining.To incorporate neural architecture search technology into speaker recognition,the Mel filter bank is used instead of the Mel frequency cepstral coefficient as the framelevel feature of the speaker’s voice,and then the neural network is used to aggregate the frame-level features to form the utterance-level features.To accelerate the training of neural networks and improve the accuracy of neural networks,the latest generalized end-to-end loss function is selected in terms of loss function.Combined with the above technology,the speaker recognition system of this paper was tested in a cluster with 11 GPU servers.The experimental results show that the neural recognition-based speaker recognition technology in this paper is better than the latest LSTM-based and x-vector-based end-to-end systems in the small public data set VCTK and large private data sets.
Keywords/Search Tags:Speaker recognition, Deep learning, Neural network, Neural architecture searches, Automatic machine learning
PDF Full Text Request
Related items