| Speech emotion recognition is one of the research hotspots in the field of emotion computing,and it has important practical application value in the industries of patient emotion monitoring,driver negative emotion monitoring and accompanying robot.Due to the recording environment between different corpora and different individual’s sex,age,there are many differences between language and culture background,resulting in a single corpus on the trained model is difficult to obtain good recognition effect on a new corpus,so the cross-corpus has important research significance of the application of speech emotion recognition.The key problem of cross-corpus speech emotion recognition is how to effectively transfer the emotion feature and improve the generalization ability of the model in the test corpus.Therefore,this Thesis mainly studies the transfer of emotional features,and the specific research contents are as follows:1.Aiming at the problem that the global adaptive algorithm can easily cause confusion and alignment of emotion features between domains,a cross-corpus speech emotion recognition model based on deep auto-encoder subdomain adaptive is proposed to achieve more fine-grained emotion feature alignment.In this model,a deep auto-encoder is used to extract low-dimensional emotional features,and a subdomain adaptive algorithm is used to reduce the distribution distance between source domain and target domain.The subdomain adaptive algorithm divides the whole local feature space into independent subdomain emotion space by label to achieve more fine-grained emotion feature alignment.The average results of six cross-database experiments show that the weighted average recall rate increases by 0.91~8.41 percentage points compared with other models.2.Aiming at the problem of gender difference in emotion information,this dissertation proposes a cross-corpus speech emotion recognition model based on multi-task learning and subdomain adaptation,which alleviates the influence of gender difference on emotion recognition.In this model,a deep denoising auto-encoder is used as a shared feature extraction network for multi-task learning,and a full connection layer and a Sofa Max layer are added as task specific layers before each recognition task.At the same time,the subdomain adaptive algorithm layer of emotion and gender is added to the back of the shared network to obtain the shared emotion feature and shared gender feature of source domain and target domain respectively,which effectively alleviates the influence of gender difference in emotion feature.The average results of six cross-corpus experiments show that the weighted average recall rate increases by 1.89~10.07 percentage points compared with other models.3.To solve the problem of negative transfer of emotion features in subdomain adaptive algorithm,a cross-corpus speech emotion recognition model based on joint domain adaptive and multi-loss dynamic adjustment is proposed to enhance the forward transfer of emotion features.Firstly,a deep denoising auto-encoder connected with deep neural network was constructed to extract the emotion features of effective representations.Then,the global adaptive algorithm and subdomain adaptive algorithm are combined to reduce the feature distribution distance between domains from both perspectives.Finally,in the process of model training,the dynamic weight factors are designed to balance the constraints of different loss functions to prevent the optimal deviation of the model,so as to obtain the global optimal solution of the model.The average result of six cross-corpus experiments shows that the weighted average recall rate is increased by 2.24~12.31 percentage points compared with other comparison models. |