Study On Voice Spoofing Detection Based On Deep Learning

Posted on:2021-05-20

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Su

Full Text:PDF

GTID:2428330602986117

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

Speech spoofing refers to the production of artificial speech by certain disguising methods for deception,of which there are two categories,namely,1)voice transformation(VT),which is to change a voice without impersonating a target,and 2)impersonating a target,including conversion(VC),speech synthesis(SS)and replay attacks.These two kinds of spoofing result in extremely high false reject rate and extremely high false acceptance rate for currently prevailing automatic speaker recognition(ASR)systems,respectively,and thus present challenges to social security.Therefore,it is of great significance to study speech spoofing detection.Generally,in the reported efforts,traditional machine learning frameworks have been adopted which consist of manual design of features and classification,while the designed features are subtle and fragile to some extent.Considering that deep learning framework has the capability of automatic extraction of deep features,we study the speech spoofing detection algorithms based on deep learning in this thesis.The main contributions are as follows.1.We propose a Dense Convolutional Network(Dense Net)based VT speech detection algorithm.The proposed network structure is designed by refining the convolutional neural network baseline,according to the joint time-frequency characteristics of speech signals.It contains a total of 135 layers to extract deep features and to improve detection accuracy.Experimental results show that the detection accuracy with various spoofing factors is over98%.Moreover,the accuracy rates in case of noise and compression are both over 90%,indicating a good robustness to noise and compression.2.We propose an end-to-end spoofing speech detection algorithm based on Convolutional Neural Network � Long Short-Term Memory(CNN-LSTM).The proposed CNN-LSTM network structure consists of convolution layers and LSTM layers.Data is directly input into the network without any prior knowledge,which maximizes effective information.The experimental results show that the detection accuracy rates of long and short clips are all of good performance of over 95%.The proposed methods in this thesis can be deployed as a pre or post module for ASR systems to distinguish spoofing speeches from genuine ones,and to enhance systemrobustness.It is of significance to the theory and applications in the filed of social security.

Keywords/Search Tags:

Voice Transformation, Voice Conversion, Spoofing Detection, Convolutional Network, Long Short-Term Memory

PDF Full Text Request

Related items

1	Research On Voice Transformation Spoofing Detection Algorithm And Implementation Of Robust ASR System
2	Research On Network Intrusion Detection Method Based On Bi-LSTM
3	Investigation On Deep Learning Based Voice Conversion
4	Speech Spoofing Detection Based On Dense Neural Network
5	2D-3D Image Conversion Method Base On Saliency Detection
6	Face Anti-spoofing Based On Deep Learning
7	Voice Conversion Based On ANN
8	Research And Application Of The Short-term Memory Network For Adjusting Gate Length
9	The Research Of Voice Activity Detection Based On Long-term Features
10	Research On Fall Detection Based On Long Short-term Memory Artificial Neural Network And Wrist Sensor