Detection Of Disguised Voice Based On Deep Residual Network

Posted on:2021-05-16

Degree:Master

Type:Thesis

Country:China

Candidate:M G Zhang

Full Text:PDF

GTID:2428330602986101

Subject:Electronic and communication engineering

Abstract/Summary:

Disguised voice can attack automatic speaker verification(ASV)systems by hiding speaker's identity or by impersonating a target.Among the disguising operations,voice transformation(VT)changes speaker's voice while maintaining acoustic naturalness,and thus hides speaker's identity,which can be implemented by many existing audio editing tools easily.Recaptured voice is another disguising operation which attacks ASV by recording target's voice.Reported efforts have revealed that these two disguising operations can deceive today's ASV systems by drastically raising false reject rate and false acceptance rate,respectively,and present challenges to society security.Therefore,studies of the detection of these two operations is of great significance.In this thesis,VT and recaptured voice detection methods based on depth residual network structure are studied,which can automatically extract deep features with a strong detection capability.The main contributions are as follows:1.For VT detection,we construct a depth residual convolution neural network which consists of 16 special residual blocks,and each block consists of three layers.The structure can learn deep acoustic features,and no gradient explosion occurs with increment of network layers,resulting in no degradation phenomenon.In the experiment,three corpora were tested.In the intra-database evaluation,all the results were above 96.4%.In the cross-database evaluation,the accuracy is above 96.43%.In the detection of the minimum disguising factor,i.e.?4,all accuracy rates are higher than 96.1%.The proposed method outperforms the reported efforts.2.For the detection of recaptured voice,we construct a depth residual network,which consists of 15 residual blocks,and each block consist of two layers.The neural network structure can extract features from very short speech segments.In the experiments,various factors including recording equipments,recording distances and recording environments are taken into consideration.The results show that the accuracy rates can achieve more than99.8% by merging all data from different sets of equipments,distances and environments.The proposed detection methods for VT and recaptured voice in this thesis can enhance ASV robustness,which is of great significance social security.

Keywords/Search Tags:

voice transformation, recaptured voice, ASV, residual network, convolution

Related items

1	Study On Voice Spoofing Detection Based On Deep Learning
2	Application And Implementation Of Voice Wake-up Technology In Voice Assistant System
3	Voice Conversion Based On ANN
4	Wenzhou Telecom Local Network Voice Platform Integration Transformation
5	High-resolution voice transformation
6	Group Voice Communication Performance Research And Improvement
7	CNN-based Short-Voice Recognition Technology And Its Application Research
8	Application And Optimal Design Of Digit Voice Processor In The Next Generation Network
9	With Voicexml Voice Browser
10	Research And Implementation Of VoIP Voice Gateway Base On A Low-Cost DSP