Font Size: a A A

Research On Speaker Verification And Its Lightweight Method Based On Deep Neural Network

Posted on:2023-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z J JiangFull Text:PDF
GTID:2568306830486244Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of intelligent speech technology,speaker verification technology has gradually entered our daily life and work.There are two research hotspots in the field of intelligent speech processing: How to further reduce the error rate of speaker verification and how to deploy speaker verification models to the terminals with low-computing resources.This thesis focuses on the problems of speaker verification and its lightweight method based on deep neural network.Main contributions of this thesis are as follows.This thesis proposes a speaker verification method based on Attentive Dilated Res2 Net Recurrent Network(ADRRN).The proposed ADRRN consists of convolutional initial layer,one-dimentional(1-D)dilated Res2 Net block,Residual bidirectional long short-term memory(RBLSTM)block,channel attentive statistical pooling layer and additive angular margin Softmax(AAM-Softmax)classifier.First,the input of ADRRN is the logarithm Mel spectra(LMS)extracted from the input speech sample.Then,the ADRRN is trained to learn speaker embedding(SE)from LMS which effectively characterizes local spatial information and global temporal information.Finally,the speaker embeddings are passed to the backend classifier for calculating the similarity by cosine similarity metric(CSM)or probabilistic linear discriminant analysis(PLDA).Equal error rate(EER)and minimum detection cost function(min DCF)are used to evaluate the performance of speaker verification.Three speech datasets selected from Vox Celeb1 and Vox Celeb2 are used for evaluation.The experimental results show that the proposed method is superior to the state-of-the-art methods of speaker verification.The proposed method also outperforms most baseline methods in terms of computational complexity and storage space.When evaluated on the experimental data with different lengths,the proposed method shows formidable generalization ability.The proposed ADRRN has high computational complexity and takes up large storage space,so it can’t be properly deployed to the terminals with low-computing resources.To overcome the above shortcomings,this thesis proposes a lightweight method based on deep representations grouping and interaction for speaker verification.The proposed module for deep representations grouping and interaction consists of initial layer,groups mean pooling layer,interaction layer,fusion layer,groups normalization layer.The proposed module for deep representations grouping and interaction is embedded in the convolutional initialization layer,1-D dilated Res2 Net block and RBLSTM block,which can reduce the model complexity.Multiply-accumulate operations(MACs)and model parameters(MP)are used for evaluating model complexity.Three speech datasets selected from Vox Celeb1 and Vox Celeb2 are used for evaluation.The experimental results show that the proposed method brings about a great decrease of computational complexity and model parameters with slight sacrifice of EER and min DCF.The proposed method is superior to other state-of-the-art lightweight methods in both model lightweight and speaker verification performance.In addition,the proposed lightweight method can be applied for the lightweight of speaker embedding extraction network with different structures.In conclusion,this thesis focuses on the problems of speaker verification and its lightweight method based on deep neural network.What’s more,this thesis proposes a speaker verification method based on the ADRRN and a lightweight method based on deep representations grouping and interaction for speaker verification.This thesis carries out multiple experiments and make a comparison between the proposed methods and other stateof-the-art methods to prove the effectiveness of the proposed methods.
Keywords/Search Tags:Deep neural network, Speaker representation, Speaker verification, Lightweight
PDF Full Text Request
Related items