Font Size: a A A

Research On Universal Fake Speech Detection Scheme Based On Deep Neural Network

Posted on:2021-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:D K LiuFull Text:PDF
GTID:2568306290494664Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
In recent years,deep learning based fake information forgery technology(Deepfake)has rapidly emerged and has aroused widespread concern in all sectors of society.Voice carries human language information,and voice forgery is one of Deepfake’s core technologies for public opinion manipulation.Therefore,the research on detection technology for deep fake has gradually become an important research field of information security.The existing methods of voice forgery are diverse and develop rapidly,which brings great challenges to countermeasure research.This paper aims at a variety of existing synthetic speech and voice conversion methods,from multiple perspectives such as time-domain signals and frequency-domain features,based on temporal convolutional networks and depth separable convolutional networks,to achieve universal forged speech detection scheme.The main research contents and innovations of this article are as follows:1)Aiming at the problem that the extraction and design of spectrum features require a lot of superparameters adjustment and the constructed neural network has poor intelligibility,etc.This paper performs multi-band filtering preprocessing on the framed waveform signal based on heuristic multi-channel Sinc band-pass filter convolution to extract frame-level multiband features for different frequencies and the SE module is used to reweight the different frequency bands adaptively to extract sequence features,and the temporal convolution network is firstly used to analyze the sequence features to construct end-to-end general forged speech detection scheme.The experimental results show that the proposed model’s detection EER towards speech samples from unknown forged speech algorithms on the test set can be as low as 7.23%.For the forged speech methods that are difficult to be detected such as voice conversion,the sample detection EER of proposed model can also be as low as 9.79% and compared with the existing timedomain signal-based detection algorithm,the performance has been significantly improved.2)Aiming at the phenomenon of phase distortion in voiceprint feature of existing forged speech algorithms,this paper firstly proposes a general forged speech detection scheme based on the phase power spectrum feature and temporal convolutional network.Based on the fact that the 1D temporal convolution can only analyze the inter-frame correlation between the energy and phase of the same frequency band,this paper divides the frequency power spectrum feature into pieces along frequency domain,and simultaneously perform inter-frame and intra-frame frequency band energy and phase correlation analysis to construct general forged speech detection feature scheme by 2D temporal convolution.The experimental results show that the detection EER towards voice samples from known forgery algorithms on the development set is quite low,and the detection EER of proposed model towards samples from unknown forgery algorithms on the evaluation set is lower than that of the existing deep learning model based on spectral features.3)Aiming at the high complexity of the existing forged speech detection model based on deep neural network,this paper firstly proposes a light forged speech detection framework based on original complex spectrum features and depth separable convolutional network.Preprocessing is performed based on complex convolution to extract features containing richer time-frequency phase information by performing real and imaginary partial solution on the complex matrix of the original spectrum features,herein forged speech detection features are extracted based on depth separable convolutional networks for classification task.Experimental results show that,compared with other existing forged speech detection frameworks based on deep neural networks,the model in this paper has fewer parameters to be trained and the detection EER is basically the same.This paper implements three universal forged speech detection algorithm based on deep learning feature representation,and the input feature includes waveform-based analog time-frequency transformation feature,phase and power spectrum feature,and complex original spectrum features.The research results show that low-level features with rich spectral information combined with carefully constructed deep neural networks can effectively improve the generalization ability of the forged speech detection algorithm towards unknown forged speech algorithms.The research results of this paper can provide technical support for preventing the proliferation of forged information on the Internet.
Keywords/Search Tags:Deep fake, Forgery speech, Time-frequency Transform, Temporal Convolutional Network, Depth-wise Separable Convolution
PDF Full Text Request
Related items