Font Size: a A A

Research On Key Issues Of Digital Audio Forensics With Channel Information

Posted on:2014-02-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z F WangFull Text:PDF
GTID:1228330401460223Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Digital audio forensics is used to detect and verify the authenticity, integrity, originalityand reliability of digital audio by direct analysis of the digital audio signal, and it is animportant part of multimedia information security. This thesis mainly focuses on three keyissues of digital audio forensics:(1) open-set recording device identification;(2) speakerrecognition forensics in channel mismatch condition;(3) playback attack forensics in speakerrecognition system. The main idea of this thesis is based on the unique correlation betweenthe channel information and the digital audio. For the purpose of digital audio forensics, twoconcepts of “channel fingerprint” and “channel pattern noise” are proposed, and the channelinformation is modeled in different domains, such as signal space, feature space and modelspace. The main contributions of this thesis are as follows:(1) Since there is currently no database for digital audio forensic, this thesis builds up aspeech database named Multi-Devices and Playback Speech Database, in which41persons(21male and20female),25recording devices and3playback devices are involved. Thiscorpus includes phrases, digit strings, sentences and paragraphs, etc. A digital audio forensicplatform named SCUT-AudForensic is developed, and it is consisting of feature extractioncomponent, model training component and decision making component. Three groups ofmodels (GMM, HMM, SVM) and features (LPC, LPCC, MFCC) are used in this system, andspeaker recognition, recording device identification and playback attack forensics can beimplemented on this platform. Experiments on the statistical analysis of features are carriedout to explore the extraction of channel information. The statistical frame analysis method isproposed to analyze the frequency response of channel information.(2) For open-set recording device identification, a channel information and deviceuniversal background model (DUBM) based recording device identification algorithm isproposed. Firstly, open set recording device identification is modeled by a two-step decisionmaking method. The channel information is extracted on mute speech through MFCC andLPC, since the mute speech contains the channel information and is not affected by textureand speaker information. The speech data of8microphones are used to train DUBM, and thein-set model (DGMM) is achieved by adapting the DUBM. Experiments show that theaccuracy of the proposed algorithm on36recording device is improved by9.22%comparedwith the algorithm based on GSV and SVM. For18in-set and18out-set recording devices,the EER is15.37%, and the accuracy for in-set recording device identification is90.07%. (3) A second recording device identification algorithm, which is based on improvedPNCC feature and two-step discriminative training, is proposed for the disadvantages of thealgorithm in part (2). The long-term frame analysis of PNCC is used to reduce the backgroundnoise, and a two-step training algorithm is used to adapting DGMMs and DUBM to improvethe discriminative ability of models. The optimal decision threshold is achieved bydiscriminative training. For short-term training and testing samples of36recording devices,the accuracy is improved by8.86%compared with the algorithm in part (2); For18in-set and18out-set recording devices, the EER is15.17%, and the accuracy for in-set recording deviceidentification is96.65%.(4) For channel mismatch problem in speaker recognition forensics, channel mappingalgorithms are proposed in signal space, feature space and model space to reduce the effect ofchannel mismatch. In signal space, chirp signal and inverse filter are used to achieve thechannel mapping parameters; in feature space, EM algorithm is use to get the normalexpression of feature mapping function, and the optimal parameters for linear mappingfunction and channel bias are discussed in details; in model space, EM algorithm is use to getthe normal expression of feature mapping function, and the optimal model parameters forsingle Gaussian model and mixture Gaussian model are discussed. Experiments show thatchannel mapping in signal space gets the best performance, but the channel response isneeded for channel mapping. The channel mapping in feature space is better than the channelmapping in model space.(5) For playback attack forensics in speaker verification system, a channel patternnoise based algorithm is proposed for playback attack forensics. The discriminativeinformation between authentic and playback speech is modeled by generation models. Theconcept of channel pattern noise is proposed in this thesis. De-noising filter is used to extractchannel pattern noise, and6Legendre coefficients and6statistical features are extracted bystatistical frame analysis method. GMM is used to model the authentic speech. Experimentsshow that the EER is reduced by9.91%compared with the algorithm based on channelsimilarity. The EER is reduced by28.92%when the speaker verification system is cooperatedwith playback attack forensics module.(6) A second playback attack forensics algorithm, which is based on Empirical ModeDecomposition-based Filtering, is proposed for the disadvantages of the algorithm in part (5).EMDF is used as denoising filter to filtering the channel noise in low frequency band, andthen the channel pattern noise can be extracted. Several playback samples are selected to trainthe playback universal background model, which may represent all playback speech and can be used as a priori knowledge for decision making. Experiments show that the EER is reducedby4.23%compared with the algorithm in (5). The EER is reduced by31.94%when thespeaker verification system is cooperated with playback attack forensics module.
Keywords/Search Tags:speech signal processing, digital multimedia forensics, digital audio forensics, recording device identification, speaker recognition, channel mismatch, playback attackforensics
PDF Full Text Request
Related items