Font Size: a A A

Research On Speech Deception Detection Algorithm Based On Semi-Supervised Learning

Posted on:2023-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:M LiuFull Text:PDF
GTID:2568307037981769Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Lying is more common in information transmission and plays a vital role in interpersonal communication.Therefore,the study of how to efficiently detect polygraphs has attracted the attention of many scholars.Traditional lie detection methods often need to be in direct contact with the detected person,such as determining whether the detected person is lying by obtaining parameters such as blood pressure,pulse,brain waves,and skin temperature.The method of polygraph detection by speech avoids direct contact with the detected person,and has the characteristics of good concealment.The speech polygraph reduces the influence of the resistance of the tested person,and the test results are more objective.Although the current research on speech lie detection has achieved certain results,there are still many problems to be solved.As an important branch of emotion recognition,the research of speech lie detection has important research significance in the fields of medical treatment,criminal investigation,and national security.However,it is relatively easy to obtain speech lie data in practical scenarios,and the difficulty of obtaining data labels limits the development of related research.In view of this,this thesis will conduct research on database construction and semi-supervised speech lie detection.The main research contents are as follows:(1)A Chinese lie corpus based on interview type is constructed(Interview corpus).Lie corpus is one of the most important contents in research of speech lie detection.However,there are very few Chinese lie corpora that have been published before,especially the production plan of the relevant corpus is not perfect.This problem seriously limits the development of speech lie detection.In response to this problem,this thesis first designs a complete production plan for the interview-type lie corpus,and obtains all the speech in the simulated interview scene;then,the audio processing software is used to cut,screen and mark the speech;finally constructs a Chinese lie corpus containing 1368 pieces of speech data is presented.In this thesis,a preliminary exploration of speech lie detection is carried out.Acoustic statistical features are processed by DNN and classification recognition experiments are carried out.The results show that the recognition rates of two corpora are higher than random guesses.(2)A semi-supervised speech deception detection algorithm based on dual channel Auto-Encoders is proposed.Aiming at the difficulty of labeling actual speech lie data,a semi-supervised speech lie detection algorithm based on dual channel Auto-Encoder is constructed in this thesis.First,in low-dimensional feature space,clean Auto-Encoder is used to obtain speech features with higher representation ability,and Denoising Auto-Encoder is used to suppress model overfitting.Then,the output features of the two encoders are merged to obtain richer speech features of lie information and further classification and recognition.Finally,the classification loss and the reconstruction loss of two Auto-Encoder are combined to optimize the model.Experimental results show that in the self-built Interview corpus,when there are 300 labeled data,the recognition accuracy reaches 74.51%.In CSC corpus,when the number of labeled data is 1000,the recognition accuracy reaches 64.21%.(3)A semi-supervised speech lie detection algorithm with high-quality pseudo-labels based on multiple constraints is proposed.Aiming at the problem that traditional semi-supervised learning model cannot make full use of the labeled information in the data,a semi-supervised speech lie detection method based on pseudo-labels is proposed.The model can fully learn the labeled information of data during training,and then performance of the model is improved during the classification.Firstly,the noised features of the unlabeled data are obtained by adding the random noise,which is to improve the generalization of the model.Then,the training model is used to generate the pseudo-labels for the unlabeled original data,and the pseudo-labels are combined with the unlabeled the noised features as the new labeled data to participate in the model retraining.Finally,the multiple constraints of pseudo-labels threshold selection,supervised classification loss,unsupervised reconstruction loss and unsupervised classification loss are combined to improve the model performance.The experimental results show that in Interview corpus,when the number of labeled data is 300,the recognition accuracy of the model reaches80.77%.In CSC corpus,when the number of labeled data is 1000,the recognition accuracy reaches 67.78%.
Keywords/Search Tags:Semi-Supervised, Speech Lie Detection, Lie Corpus, Auto-Encoder, Pseudo-Labels
PDF Full Text Request
Related items