Font Size: a A A

Research On Fake Voice Detection Methods Based On Ensemble Learning

Posted on:2023-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:J C FuFull Text:PDF
GTID:2568307031454914Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Fake voice attacks can impersonate the identity of the target user and then control the target user’s smart device and security account,which poses a serious threat to the voice authentication system.Based on the public speech datasets POCO,Voxceleb1,and ADD2022,fake voice is detected by deep learning and ensemble learning methods,and three detection methods of fake voice with different types are studied.The main research contents are as follows:1)Aiming at the problem that short speech commands have little audio information and are not suitable for sentence-level replay voice detection,a Res Net-Light GBM model based on self-attention mechanism is proposed for word-level replay voice detection.First,a new audio frame selection model is proposed.Then,the GFCC acoustic features of these frames are computed.On this basis,specific information in GFCC features is further extracted based on self-attention Res Net.Finally,the extracted features are trained and classified with Light GBM to achieve better detection results.The results on the POCO dataset fully verify the effectiveness of this scheme.2)Aiming at the problem of poor robustness of the existing adversarial fake voice detection methods to unknown adversarial fake voice,an adversarial fake voice defense strategy based on adding noise and voting method is proposed.First,a speaker recognition system is constructed based on the Fast Res Net model.Second,the original test speech is processed by adding random Gaussian noise in batches to obtain a new score group.Secondly,the speaker recognition system obtains a new score according to the average score of the original test speech and the new speech after adding noise,and compares it with the threshold to determine whether it is an adversarial fake speech.On the Voxceleb1 dataset,the experimental results show that the proposed method can effectively defend against adversarial speech attacks.3)Aiming at the problem that synthetic voice detection methods have many model parameters and long training time,a light weight SE-Res Net model based on attention mechanism is proposed.First,the traditional SE-Res Net module is changed to the Small SE-Res Net model,which improves the model training speed without losing model accuracy.Second,the accuracy of three pooling layers is compared in synthetic voice detection.Finally,the model training method in the snapshot ensemble is used to integrate the local optimal solution model during the training process to improve the accuracy of the model.On the ADD2022 dataset,synthetic speech can be effectively detected with all strategies.Figure 14;Table 21;Reference 53...
Keywords/Search Tags:fake speech detection, ResNet, ensemble learning, LightGBM, voting method, snapshot ensemble
PDF Full Text Request
Related items