Font Size: a A A

Cross-Corpus Speech Emotion Recognition Based On Subspace Learning

Posted on:2024-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:K K ZhaoFull Text:PDF
GTID:2568307055970819Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Speech emotion recognition is an integral component of human-computer interaction.It involves identifying and classifying emotional states from speech signals,such as anger,sadness,happiness,disgust,surprise,and neutral.Despite the widespread implementation of this technology in the field of artificial intelligence,which has advanced the development of the intelligent information age,existing methods for recognizing speech emotions are subject to certain limitations.Firstly,traditional methods assume that the training and testing data have similar distributions.However,due to differences in language application areas and cultural backgrounds,the distribution of training and testing data may differ,leading to limitations in recognition performance.Secondly,existing speech emotion datasets are high-dimensional,and specific statistical features are easily lost during dimensionality reduction.To overcome these challenges,transfer learning and subspace learning methodologies are employed in this study to enhance the recognition rate for cross-domain speech emotion recognition.This thesis centers on three interrelated areas of research within the field of speech emotion recognition.First,a discriminative sparse subspace learning method is proposed which utilizes transfer learning and subspace learning to obtain a new projection that can project the training and testing data into a new low-dimensional common subspace.The proposed technique incorporates a sparse reconstruction strategy that utilizes information from the source data to enhance the linear representation of the target data.Additionally,a domain-invariant projection is learned to align the features into a new common subspace.To avoid trivial solutions,l2,1-norm constraints are applied to the projection matrix and the reconstruction matrix to enhance the compactness of the related samples from different domains.The proposed method achieves superior recognition performance,as demonstrated by experiments conducted on three classical speech emotion datasets.Second,this research proposes an adaptive weighted transfer subspace learning approach for the recognition of speech emotions.The method uses transfer learning to transfer knowledge and simultaneously learns an invariant projection.To enhance the feature representation and minimize the contribution of redundant features,an adaptive weighted matrix is introduced in this study.Sparse constraints are also employed to improve the fusion of related samples from different domains and achieve better performance.Upon evaluation using four well-known datasets,the results demonstrate that the suggested approach outperforms other existing methods and yields the highest accuracy for speech emotion recognition.Finally,a joint instance reconstruction and feature subspace alignment method is proposed for speech emotion recognition.The proposed method utilizes the learning of the projection matrix to minimize the projection matrix of the source and target domains.The features are aligned into their respective feature subspaces,which improves the recognition performance.The performance of the proposed approach is significantly enhanced by utilizing an adaptive weighted matrix,which effectively emphasizes the representation of significant features and reduces the impact of redundant features.This is demonstrated that the proposed method achieved the best recognition performance through experiments conducted on four classical speech emotion datasets.
Keywords/Search Tags:Speech emotion recognition, Transfer learning, Subspace learning, Sparse reconstruction, Adaptive weighted matrix
PDF Full Text Request
Related items