Font Size: a A A

Study On Voice Spoofing Detection Based On Deep Learning

Posted on:2021-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y SuFull Text:PDF
GTID:2428330602986117Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Speech spoofing refers to the production of artificial speech by certain disguising methods for deception,of which there are two categories,namely,1)voice transformation(VT),which is to change a voice without impersonating a target,and 2)impersonating a target,including conversion(VC),speech synthesis(SS)and replay attacks.These two kinds of spoofing result in extremely high false reject rate and extremely high false acceptance rate for currently prevailing automatic speaker recognition(ASR)systems,respectively,and thus present challenges to social security.Therefore,it is of great significance to study speech spoofing detection.Generally,in the reported efforts,traditional machine learning frameworks have been adopted which consist of manual design of features and classification,while the designed features are subtle and fragile to some extent.Considering that deep learning framework has the capability of automatic extraction of deep features,we study the speech spoofing detection algorithms based on deep learning in this thesis.The main contributions are as follows.1.We propose a Dense Convolutional Network(Dense Net)based VT speech detection algorithm.The proposed network structure is designed by refining the convolutional neural network baseline,according to the joint time-frequency characteristics of speech signals.It contains a total of 135 layers to extract deep features and to improve detection accuracy.Experimental results show that the detection accuracy with various spoofing factors is over98%.Moreover,the accuracy rates in case of noise and compression are both over 90%,indicating a good robustness to noise and compression.2.We propose an end-to-end spoofing speech detection algorithm based on Convolutional Neural Network — Long Short-Term Memory(CNN-LSTM).The proposed CNN-LSTM network structure consists of convolution layers and LSTM layers.Data is directly input into the network without any prior knowledge,which maximizes effective information.The experimental results show that the detection accuracy rates of long and short clips are all of good performance of over 95%.The proposed methods in this thesis can be deployed as a pre or post module for ASR systems to distinguish spoofing speeches from genuine ones,and to enhance systemrobustness.It is of significance to the theory and applications in the filed of social security.
Keywords/Search Tags:Voice Transformation, Voice Conversion, Spoofing Detection, Convolutional Network, Long Short-Term Memory
PDF Full Text Request
Related items