Improved Research On Speech Emotion Recognition Based On Phonological Representation

Posted on:2019-12-09

Degree:Master

Type:Thesis

Country:China

Candidate:L J Shen

Full Text:PDF

GTID:2437330548481248

Subject:Education Technology

Abstract/Summary:

PDF Full Text Request

Background:Speech is the most natural and efficient way to human machine interaction.Speech includes human’s emotional state.Speech emotion recognition is the key technique to realize more natural and more intelligent human machine interaction.The improvement of speech emotion recognition(SER)relies on the classifiers and features.In terms of feature selection,so far,most of the research only uses a large set of acoustic features which can’t shed light on the relationship between emotion and prosody.Goal:(1)We improve SER by combining phonological representations and acoustic features with deep learning method;(2)We explore the relationship between prosody and specific emotions in order to get the specific pattern of prosody in different emotions.Method:Our experiments include two parts.(1)We improve the SER on the public interactive emotional dyadic motion capture(IEMOCAP)database by combing acoustic and phonological features together under a leave-one-speaker-out cross validation framework.The experiments are based on utterance-level and clustered acoustic words-level.A support vector machine,logistic regression and a convolutional neural network(CNN)are used in our experiment.(2)We analyze the discriminative power of phonological and acoustic features for emotion recognition with logistic regression models.Results:(1)With phonological representations,CNN provides 60.02%of the unweighted average recall(UAR)on categorical emotion recognition which becomes the state-of-the-art.When compared to the baseline system that is based on acoustic features only,the proposed system with combined features gets 3.1%improvement of UAR in category emotion classification,4.08%,3.51%and 3.9%improvement of UAR in activation,dominance and valence dimensional emotion classification,respectively.(2)In the experiment based on clustered acoustic words,combing acoustic features and phonological representations,long short term memory recurrent neural network performs the best among almost all the tasks,achieving 44.98%,52.94%,44.79%,38.05%of UAR,respectively,whereas the recurrent neural network performs the worst.(3)pitch accent and break indices are discriminative in distinguishing dominance and activation.Emotion with high activation has more pitch accent and speaking rate is very fluent.Emotion with high dominance has more long pauses.(4)Logmel frequency band and loudness are the top two most discriminative acoustic features.Loudness is very predictive in distinguishing activation.Conclusions:The outcome is objective on the public database.We found some salient phonological features to help distinguish specific emotions.This research provides us with the relationship between phonology and emotion.Combining phonological representations and acoustic features in SER will become a good baseline system in the future.Novelty:(1)explore the relationship between emotions and prosody and unveil the specific prosodic patterns of speech in different emotions;(2)getting the improvement of speech emotion recognition by combing acoustic features and phonological representations;propose the new ideas of emotion recognition based on acoustic words which is inspired by the application of deep learning in natural language processing.

Keywords/Search Tags:

speech emotion recognition, acoustic features, phonology, feature analysis, deep learning

PDF Full Text Request

Related items

1	Research On Teacher’s Speech Emotion Recognition Based On Deep Learning
2	Multi-channel Convolutional Classroom Speech Emotion Recognition Based On Attention Mechanism
3	Emotion Recognition Of Eye Movement And Audio-Visual Features For MOOC Learning Scenarios
4	Research On Learners’ Emotion Recognition Method In Teaching Environment
5	Research On Old People’s Chatting Robot Based On Deep Learning
6	Research On Key Technologies Of Ai-assisted Online Education
7	Emotion Analysis Of Students In Smart Classroom Based On Deep Learning
8	Acoustic Model Training Based On Data Noise And Text-speech Alignment
9	Research On Student Engagement Recognition Based On Feature Fusion And Deep Learning
10	Research On Spontaneous Learning Facial Expression Recognition Based On RealSense