Research On Cross-corpus Speech Emotion Recognition Based On Deep Learning

Posted on:2024-04-19

Degree:Master

Type:Thesis

Country:China

Candidate:R X Liu

Full Text:PDF

GTID:2568307073497304

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

Speech Emotion Recognition(SER)is an effective method in bridging the communication divide between humans and computers which plays a critical role in Human-Computer Interaction(HCI).SER aims to identify various types of human emotions from a given speech signal sample.Traditional SER training and testing are performed on a single corpus.However,due to differences in recording equipment quality,spoken language,environmental noise,and subject statistics,the feature distribution between various corpora is different.When training(source domain)and testing(target domain)are performed on different corpora,the performance of traditional SER methods will decrease significantly.To overcome the above problems,this thesis aims to develop a study on cross-corpus SER based on deep learning.A summary of the principal contributions of this thesis is as follows:(1)We suggest an unsupervised domain adaptation method-based Transformers and domain adversarial neural network.This method first extracts the IS09 and IS10 feature sets in the INTERSPEECH emotional challenge,and then uses the encoder of Transformers to learn the context information from the extracted manual features to obtain the time series features of each utterance.In order to obtain domain-invariant features suitable for cross-corpus SER,a domain discriminator is designed to classify speech samples from the source domain or target domain through training.It encourages domain confusion by adversarial goals to learn shared feature representations between domains.(2)We suggest an unsupervised feature decomposition domain adaptation methodbased Transformers and mutual information.This method uses a pre-trained deep audio model for audio feature extraction,and then uses the encoder layer of Transformers to construct a domain-invariant feature extractor.A Max-Min Mutual Information strategy is designed to learn domain-invariant features from input deep features and perform final emotion classification.Finally,to minimize the impact of speaker deviations on the model performance,we design a speaker discriminator that forces the domaininvariant feature extractor to discard speaker information.(3)The cross-corpus experiments on three public speech emotion databases(IEMOCAP,MSP-Imporv and CASIA)indicate that the presented method can effectively improve the performance of cross-corpus SER and improve the classification accuracy.In addition,the proposed method is compared with the baseline model and other state-of-the-art domain adaptation methods.The results reveal that the suggested method attains the highest experimental results,which verifies the validity of the suggested method in cross-corpus SER research.This thesis focuses on modeling optimization of cross-corpus speech samples based on deep learning,which is applied to improve the recognition performance of cross-corpus speech emotions.By comparing with the baseline model and the state-ofthe-art domain adaptation,the model presented in this thesis achieves the optimum results and proves the validity of the presented model.In future research,constructing models with more generalization capability and fusing other modalities for cross-corpus SER are important research directions.

Keywords/Search Tags:

deep learning, cross-corpus, speech emotion recognition, domain adaptation, domain adversarial

PDF Full Text Request

Related items

1	Research On Cross-corpus Speech Emotion Recognition Based On Domain Adaptation
2	Reasearch On Cross Corpus Speech Emotion Recognition Based On Domain Adversarial Training
3	Reasearch On Cross-corpus Speech Emotion Recognition Based On Progressive Distribution Adaption And Emotion Discriminability Ehancement
4	Research On Cross-corpus Speech Emotion Recognition Based On Feature Processing And Transferring
5	Research On Cross-corpus Speech Emotion Recognition Technology Based On Transfer Learning
6	Speech Emotion Recognition Via Domain Adaptation
7	Research On Cross-corpus Speech Emotion Recognition Based On Target Adaptation
8	Research On Several Key Technologies In Cross-corpus Speech Emotion Recognition
9	Speech Emotion Recognition Based On Deep Separable Convolution And Cross Corpus
10	Key Technologies And Applications In Cross-domain Image Recognition