Font Size: a A A

Research On Cross-corpus Speech Emotion Recognition Based On Feature Processing And Transferring

Posted on:2024-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2568307097469304Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Speech Emotion Recognition technology is currently a research hotspot in the field of affective computing.By analyzing speech signals to extract emotional information,it can improve the usability of human-computer interaction systems.In addition,speech emotion recognition technology also has important application value in criminal investigation and interrogation,infant crying analysis,customer service center emotion monitoring,and other scenarios.Currently,researchers have proposed numerous algorithms to solve the challenges faced by this technology.However,these algorithms mainly focus on a single corpus.In practical applications,training speech and testing speech often come from different corpora.Influenced by factors such as language,recording environment,culture,and voice production methods,the recognition performance of many speech emotion recognition algorithms can significantly decline.Therefore,this thesis focuses on the issue of speech feature representation and data distribution discrepancy in cross-corpus speech emotion recognition.A method based on deep learning is used for feature processing to mine the complex nonlinear mapping relationship between speech features and emotional labels,thereby improving the emotional representation ability of speech features.Simultaneously,the domain adaptation method is optimized to perform feature transferring to alleviate the loss of model generalization caused by data distribution discrepancy in real scenarios.Through these measures,the cross-corpus speech emotion recognition performance of the entire model is improved.The specific work content includes:(1)A cross-corpus speech emotion recognition method based on decision boundary optimized domain adaptation was proposed.Domain adaptive methods are widely used in cross-corpus speech emotion recognition.However,many domain adaptive algorithms seek to reduce domain discrepancy while losing the discriminability of target domain samples,resulting in their high-density presence at model decision boundaries,reducing the performance of the model.Based on this,first,a convolutional neural network is used for feature processing,and then the features are fed to the maximum mean discrepancy module.While reducing inter domain differences,the kernel norm of the emotional prediction probability matrix in the target domain is maximized to improve the discriminability of the target domain samples and optimize the decision boundaries.In six sets of cross-corpus experiments based on Berlin,e NTERFACE,and CASIA corpora,the average recognition accuracy of the proposed method is 1.68% to 11.01% higher than other algorithms,indicating that the model effectively reduces the sample density of decision boundaries and improves the accuracy of prediction.(2)Further,in order to improve the consistency of feature processing and transferring processes in cross-corpus speech emotion recognition,a method based on convolutional autoencoder and adversarial domain adaptation was proposed.The framework first constructs a one-dimensional convolutional auto-encoder for feature processing to explore the correlation between adjacent one-dimensional statistical features,feature representation can be enhanced through a network architecture based on encoding and decoding styles.Subsequently,adversarial domain adaptation technology is used to alleviate the feature distribution discrepancy between the source domain and the target domain through confusion domain discriminator,and maximum mean discrepancy is incorporated to achieve statistical order alignment of features to better complete feature transferring,while constraining the training of feature processing networks.In order to evaluate the proposed model,this chapter conducted visualization experiments and recognition performance comparisons on Berlin,e NTERFACE,and CASIA corpora.The results show that the proposed method is superior to other state-of-the-art algorithms in the field.(3)In order to further improve the interpretability of speech feature representations in cross-corpus speech emotion recognition,a method based on causal representation domain adaptation is proposed.Firstly,a auto-encoder architecture network is used to learn the key emotional representations closely related to emotional labels in speech features,removing noncausal redundant features,Subsequently,based on the causal attributes in causal representation learning,a feature correlation matrix is modeled,and a causal decomposition loss is constructed to make each dimension of the feature independent.By learning causal representation,the generalization ability of the network is improved.Subsequently,a maximum mean discrepancy algorithm is used for feature transferring to facilitate model migration to new speech data.Extensive cross-corpus emotion recognition experiments have been conducted using the proposed method on three corpora,and the recognition results obtained in most tasks are ahead of traditional methods and deep domain adaptation methods.The experimental results demonstrate the effectiveness of the proposed algorithm in crosscorpus speech emotion recognition tasks.
Keywords/Search Tags:Cross-corpus speech emotion recognition, Deep learning, Domain adaptation, Casual representation learning
PDF Full Text Request
Related items