Font Size: a A A

Research And Implementation Of Deep Learning-Based Detection Of Fake Audio

Posted on:2024-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ShanFull Text:PDF
GTID:2568307136988939Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of audio generation technologies such as deep generative networks and waveform transformation,humans are able to synthesize clear,contextually-relevant audio with specific timbre.These generated audios can be useful for human-computer interaction or activities such as audiobooks.However,this technology also poses significant social security risks.Malicious actors may collect influential individuals’ voices and train deceptive audio generation models to create rumors,incite social panic,or attack identity recognition systems to steal personal information.Therefore,detecting synthesized audio has become an important area of research in social security.Detecting forged audio involves two main steps:The first step is to extract voiceprint features by converting speech into matrix format and extracting features suitable for learning model classification and detecting forged information.The second step is to use a deep learning model for detection,where the model distinguishes between human-generated natural audio and machine-generated audio by extracting deep forgery features from the speech.The main work of this article revolves around the detection method of forged audio based on deep learning,as follows:(1)To address the current problem of single-feature extraction with weak anti-interference ability,we propose an anti-interference feature based on attention mechanism fusion.Firstly,according to the different characteristics of forged features in different forgery scenarios,inverse Mel frequency cepstral coefficients,gamma frequency cepstral coefficients and linear frequency cepstral coefficients are extracted respectively.Secondly,considering the fact that a large number of forgery methods and noises are added in multiple forgery environments,and higher-order features are difficult to forge while possessing stronger anti-interference abilities,we propose segmented cepstral coefficients to enhance the anti-interference ability of inverse Mel frequency cepstral coefficients and linear frequency cepstral coefficients.Then,the three features are fused using the scaled dot-product attention mechanism,in which attention mechanism weights participate in training.Finally,the proposed feature extraction method is experimentally verified to have more comprehensive characterization ability and certain anti-interference ability.(2)To address the issues of weak generalization and the ability to detect audio forgery in multiple scenarios of current models,combined with the application of Res2 Net for detecting audio forgery,proposes an Audio Attention Residual Network.Firstly,antinoise features based on attention mechanism fusion are used as inputs to the model.Then,channel spatial attention and spatial attention mechanisms are added to the Res2 Net model’s backbone to assign initial channel importance and reduce the impact of irrelevant feature points caused by noise while making channels associated.Subsequently,multi-head attention mechanism layers are added to the bottom of the backbone to extract frame-related features that are difficult for the audio forgery generation model to forge and neglected by previous models.Finally,experimental results demonstrate that:(a)the accuracy is better than that of most single models in the AsvSpoof competition,and although the detection rate is slightly lower,the detection efficiency and training costs are superior to those of multi-model in the competition.(b)It maintains a better detection rate under different forgery scenarios,and its generalization and multi-scenario detection abilities are better than most models.(3)The anti-interference features based on attention mechanism fusion and the audio attention mechanism residual network have a progressive relationship.On this basis,further functional modules of deep learning-based fake audio detection systems were implemented.Finally,through extensive testing,the practicality of the system was verified.
Keywords/Search Tags:Audio forgery detection, Audio feature, Res2Net, Attention mechanism
PDF Full Text Request
Related items