Font Size: a A A

Recognizing emotions from spoken dialogs: A signal processing approach

Posted on:2005-10-09Degree:Ph.DType:Dissertation
University:University of Southern CaliforniaCandidate:Lee, Chul MinFull Text:PDF
GTID:1455390008480262Subject:Engineering
Abstract/Summary:
The importance of automatically recognizing emotions from human speech has grown with the increasing role of spoken language interfaces in human-computer interaction applications. This study addresses the design of an automatic emotion recognition system using spoken language information through signal processing and pattern recognition techniques using four databases in which two database have emotions expressed by actors and the other two have naturally expressed emotions. The specific focus is on a case study of detecting negative and non-negative emotions using data obtained from a call center application. The appropriateness of these emotional classes are analyzed by comparing spoken language data from real interactions against those obtained from actor's voice data. In this study, optimization of the acoustic correlates of emotions with respect to classification error, was carried out by investigating different feature sets obtained from feature selection, followed by principal component analysis. Emotions are uncertain in their definition and everyday usage; fuzzy inference system is proposed to deal with this uncertainties and the classification results and comparison with other statistical pattern classifiers are shown in this study. Short-term spectral features also convey the emotional coloring in utterances. This study investigates the effect of emotion on speech using Hidden Markov Model based on 5 different phoneme-classes. Most previous studies in emotion recognition have used only the acoustic information in speech. This study also explores the detection of domain specific emotions using language and discourse information in conjunction with acoustic correlates of emotions in the speech signal. In this study, the combination of the three sources of information---acoustic, lexical and discourse---is used for emotion recognition. To capture emotional information at the language level, an information-theoretic notion of emotional salience is introduced. Discourse information is users' dialog act labels that represent the variation in utterance form and contents in dialogs. Experimental results on the call center data and movie data show that the improvement can be made by the combination of information sources with respect to classification performances.
Keywords/Search Tags:Emotions, Spoken, Information, Data, Signal, Speech
Related items