| With the development of deep neural networks and easy access to computational resources and massive data,Automatic Speaker Verification(ASV)technology has been flourishing in applications such as voice authorization forensics and voice control of Internet of Things devices.However,advanced ASV systems are still vulnerable to a variety of malicious spoofing attacks.Playback attack has become the most threatening attack mode to ASV system because of its advantages of simple implementation and remarkable attack effect.It is necessary and practical to study how to improve the performance of ASV system against recording playback attacks.Studies have shown that acoustic features play a key role in Playback Attack Detection(PAD)systems.Therefore,this paper focuses on the acoustic features of the speaker verification system against playback attacks to improve the performance of the system.The main work content and innovation of this paper are as follows:(1)In order to improve the performance of ASV system against playback attacks,this paper studies the performance of graph frequency cepstral features extracted by graph signal processing technology in Spoofing-Aware Speaker Verification(SASV)system.Compared with the acoustic features based on traditional digital signal processing technology,the graph frequency cepstrum features pay more attention to different levels of information between speech sampling points and excavate the essential features between speech sampling points.This paper proposes a SASV system based on graph frequency cepstrum features.Among them,the ASV sub-system uses ECAPA-TDNN as the back end,and the Countermeasures(CM)subsystem uses the graph frequency cepstrum feature and the Gaussian Mixture Model(GMM)and performed performance tests on ASVspoof 2019 LA ASV Development Set and Evaluation Set.The experimental results show that the proposed SASV system has excellent recognition accuracy in both development and evaluation sets.(2)In order to further improve the performance of PAD system,this paper analyzes the essential difference between real speech and corresponding playback speech using mathematical knowledge,that is,device information.Starting from the device information,factor analysis method is used to remove the commonality between real speech and playback speech to get the difference,that is,to get the device information.Then,the expectation maximization algorithm is used to train the linear transformation parameters to extract the device features.Based on the traditional acoustic features,three different device features are obtained,namely LFDCC,MFDCC and CQDCC.Finally,an experiment designed in this paper evaluated the proposed features on the parallel training dataset ASVspoof 2017 version 2.0.In the experiment,GMMis used as a back-end classifier and different coefficient configurations of these three traditional features and corresponding device features are investigated.The experimental results show that the performance of device features under dynamic coefficient configuration is better than that under static coefficient configuration,and the performance of three different traditional features is not as good as its corresponding device characteristics.In addition,CQDCC is superior to MFDCC and LFDCC in replay attack detection.Compared with the existing single-system performance,the proposed CQDCC-SDA feature is very effective for replay attack detection. |