Research And Implementation Of Deep Learning-Based Detection Of Fake Audio

Posted on:2024-01-14

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Shan

Full Text:PDF

GTID:2568307136988939

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of audio generation technologies such as deep generative networks and waveform transformation,humans are able to synthesize clear,contextually-relevant audio with specific timbre.These generated audios can be useful for human-computer interaction or activities such as audiobooks.However,this technology also poses significant social security risks.Malicious actors may collect influential individuals’ voices and train deceptive audio generation models to create rumors,incite social panic,or attack identity recognition systems to steal personal information.Therefore,detecting synthesized audio has become an important area of research in social security.Detecting forged audio involves two main steps:The first step is to extract voiceprint features by converting speech into matrix format and extracting features suitable for learning model classification and detecting forged information.The second step is to use a deep learning model for detection,where the model distinguishes between human-generated natural audio and machine-generated audio by extracting deep forgery features from the speech.The main work of this article revolves around the detection method of forged audio based on deep learning,as follows:(1)To address the current problem of single-feature extraction with weak anti-interference ability,we propose an anti-interference feature based on attention mechanism fusion.Firstly,according to the different characteristics of forged features in different forgery scenarios,inverse Mel frequency cepstral coefficients,gamma frequency cepstral coefficients and linear frequency cepstral coefficients are extracted respectively.Secondly,considering the fact that a large number of forgery methods and noises are added in multiple forgery environments,and higher-order features are difficult to forge while possessing stronger anti-interference abilities,we propose segmented cepstral coefficients to enhance the anti-interference ability of inverse Mel frequency cepstral coefficients and linear frequency cepstral coefficients.Then,the three features are fused using the scaled dot-product attention mechanism,in which attention mechanism weights participate in training.Finally,the proposed feature extraction method is experimentally verified to have more comprehensive characterization ability and certain anti-interference ability.(2)To address the issues of weak generalization and the ability to detect audio forgery in multiple scenarios of current models,combined with the application of Res2 Net for detecting audio forgery,proposes an Audio Attention Residual Network.Firstly,antinoise features based on attention mechanism fusion are used as inputs to the model.Then,channel spatial attention and spatial attention mechanisms are added to the Res2 Net model’s backbone to assign initial channel importance and reduce the impact of irrelevant feature points caused by noise while making channels associated.Subsequently,multi-head attention mechanism layers are added to the bottom of the backbone to extract frame-related features that are difficult for the audio forgery generation model to forge and neglected by previous models.Finally,experimental results demonstrate that:(a)the accuracy is better than that of most single models in the AsvSpoof competition,and although the detection rate is slightly lower,the detection efficiency and training costs are superior to those of multi-model in the competition.(b)It maintains a better detection rate under different forgery scenarios,and its generalization and multi-scenario detection abilities are better than most models.(3)The anti-interference features based on attention mechanism fusion and the audio attention mechanism residual network have a progressive relationship.On this basis,further functional modules of deep learning-based fake audio detection systems were implemented.Finally,through extensive testing,the practicality of the system was verified.

Keywords/Search Tags:

Audio forgery detection, Audio feature, Res2Net, Attention mechanism

PDF Full Text Request

Related items

1	Study On The Two Kinds Of Digital Audio Forgery Detection
2	Digital Forensic Technology For Audio Forgery Detection
3	Research On The Violent Detection Of Audio And Video Based On Attention Mechanism
4	Study On Double Compression Detection Of AMR And AAC Audio
5	Forgery Detection Of Digital Audio
6	Forgery Detection Of Digital Audio
7	Study On The Audio Forgery Detection Under Three Cases
8	Audio Signal Forgery Detection Methods Based On Spectrogram And Pitch Synchronous
9	Research On Intelligent Audio Detection And Enhancement Method In Strong Noise Background
10	Research On ENF Signal-based Blind Detection Of Digital Audio Forgery