Research On Synthetic Speech Deepfake Detection Based On Multi-scale GMM-ResNet

Posted on:2024-02-19

Degree:Master

Type:Thesis

Country:China

Candidate:M M Cao

Full Text:PDF

GTID:2568307112476534

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

Automatic speaker verification(ASV)system is a system that performs identity verification based on the speaker’s voice information,and is currently widely used in various life scenarios such as mobile phone unlocking,intelligent access control,and bank identity verification.With the application of deep learning models in recent years,ASV systems have also made significant progress and demonstrated good performance.However,they are also susceptible to counterfeit attacks using synthesized or converted speech,and synthetic speech deepfake detection systems are dedicated to solving this problem.In this study,a multi-scale GMM-ResNet model is proposed for synthetic speech deepfake detection.The model mainly consists of two parts: multi-scale Log Gaussian Probability(LGP)feature fusion and Multi-scale Feature Aggregation ResNet(MFA-ResNet).The main elements are as follows:(1)For the problem of correlation between Gaussian components in the different order GMMs,this thesis proposes a synthetic speech deepfake detection method with multi-scale LGP feature fusion.GMMs describes the distribution of speech features in their space,and different orders GMMs have different descriptive abilities.LGP features calculated based on different orders GMMs also reflect the information contained in speech at different scales.Multi-scale LGP feature fusion weighted the three scales of LGP features obtained based on different orders GMMs and input the obtained features to the subsequent ResNet classifier.The purpose is to facilitate the information exchange between different scale LGP features.Multi-scale LGP feature fusion +ResNet model with min t-DCF=0.2488 and EER=2.62% in ASVspoof2021 logical access scenario.(2)For the problem of different levels of residual block output features in the ResNet model,this thesis proposes an MFA-ResNet model for synthetic speech deepfake detection.When training deep neural networks,the feature information obtained in the first or intermediate layers is also very useful for classification tasks.Based on this experience,the MFA-ResNet model improves the feature extraction capability of the network by aggregating the features output from each ResNet residual block and fully fusing the feature information of different layers within the network.Multi-scale LGP feature fusion and MFA-ResNet model are integrated to obtain multiscale GMM-ResNet model,which further improves the effectiveness of synthetic speech deepfake detection.Multi-scale GMM-ResNet model with min t-DCF=0.2442 and EER=2.43% in ASVspoof2021 logical access scenario.

Keywords/Search Tags:

Multi-scale LGP Feature Fusion, MFA-ResNet, Multi-scale GMM-ResNet, Synthetic Speech Deepfake Detection

PDF Full Text Request

Related items

1	Research On Source Camera Identification Based On ResNet And Multi-scale Feature Fusion Strategy
2	Deepfake Detection Method Via Cross-domain Multi-scale Feature Fusion
3	Research On Multi-scale And Multi-person Falling Detection Methods Based On Deep Learning
4	Research On Target Detection And Recognition Algorithm Based On SSD And Inception＿resnet＿v2 Network
5	Research On Target Detection Model Based On Mlti-level And Multi-scale Feature Fusion
6	Research On Deepfake Synthetic Image Detection Algorithm Based On Multi-feature Fusion And Content Masking Coding
7	Research On Multi-Scale Change Detection In Multi-Temporal SAR Imagery
8	Research On Image Classification Algorithm Based On Improved ResNet
9	Research On Synthetic Speech Deepfake Detection Based On Improved Transformer
10	Defect Detection Of The End Face Of Micro-precision Glass Encapsulated Electrical Connectors Based On Multi-scale Feature Fusion SSD