| In the era of information diversity,the continuous accumulation of big data directly promotes the development of artificial intelligence,at the same time,it brings more challenges to affective computing technology.Speech emotion recognition,as an important research branch of affective computing,machine learning,human-computer interaction and speech signal processing,aims to identify human emotions from speech signals,such as disgust,anger,sadness,neutral,boredom,fear,happiness,and so on.Traditional speech emotion recognition methods conduct on a single emotional corpus.However,in practical applications,training and testing data under diversified information often collects from different scenarios,languages,scale,speakers,and so on,resulting in a great discrepancy between training and testing datasets,leading to a significant decline in the results of emotion recognition.In order to solve this challenging problem,this thesis uses the strategy of transfer learning to extend the subspace learning methods and linear regression methods to cross-corpus scenarios,so as to improve the generalization ability of the emotional classification model and adaptive to different data distribution.Specifically,the main contents of this thesis are as follows:First,the theoretical basis of speech emotion recognition are presented,briefly introduces its process,several popular speech emotional corpora and three popular speech signal features,and especially focuses on the linear regression methods,subspace learning methods and transfer learning methods,which are closely related to the research in this thesis.Second,based on the slack label regression and transfer learning,a transferable discriminant linear regression algorithm is proposed.Firstly,the slack label regression strategy is introduced.Compared with binary label regression,its generalization ability and discrimination are stronger,resulting in a more robust regression model.Secondly,the dual metric strategy is proposed to learn the local structure information of the interdomain while narrowing the global data distribution.Finally,the linear relationship of category space between domains is learned to obtain more discriminative power and improve the transfer performance.Third,based on the subspace learning and the transfer learning,a coupled discriminant subspace alignment algorithm is proposed.It aims to learn a latent common discriminant subspace while preserving the individual discriminant information in the domain.Firstly,coupled discriminant subspaces are obtained by conducting linear discriminant analysis in the source and target domain respectively.Secondly,the linear reconstruction and projection alignment strategies are used to learn the structural information of cross-domain samples while narrowing the feature distribution between domains,so as to align the coupled discriminant subspaces and finally obtain a latent common discriminant subspace.Fourth,the proposed coupled discriminant subspace alignment algorithm is extended to the multi-source cross-corpus scenario,named multi-source discriminant subspace alignment.Firstly,linear discriminant analysis is conducted in multiple source domains to obtain the multi-source discriminant subspace.Secondly,the multi-source discriminant subspace samples are linear reconstructed into the target subspace.At the same time,the contribution of multi-source reconstruction will be determined by adaptive weights.Finally,the multi-source projections alignment to finally obtain a common discriminant subspace with better generalization ability.This algorithm provides a reference scheme for how single-source transfer learning work can be extended to multi-source transfer scenario.Meanwhile,it is among the first attempts to complete the multi-source crosscorpus tasks in speech emotion recognition.Finally,the conclusions of this thesis are presented,and give the feasible solutions for the future work. |