Font Size: a A A

Research And Application Of Speech Emotion Recognition Technology Based On Feature Fusion

Posted on:2022-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:P F XiaFull Text:PDF
GTID:2518306338990719Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Computer-related technologies,there have been incresaing requirements for human-computer interaction technology,not only to realize communication between humans and computers,but also to accurately identify human emotions and emotional changes with computers.Therefore,speech emotion recognition,as one of the research hotspots in the field of artificial intelligence application,is paramount in human-computer interaction.Therefore,this article carried out related research on speech emotion recognition.Based on speech emotion databases,EMO-DB and RAVDESS,this paper proposed a Stacking model,which combines the global and time-sequence characteristics of speech.The accuracy and effectiveness of the integrated learning model were verified by experiments.In addition,this paper introduced the model into the online classroom,and analyzed the emotional changes of teachers in the classroom through the speech emotion recognition model.This method motified teachers of their emotion changes in time,and also served as a way to evaluate the quality of teaching.The main tasks of this paper are as follows:(1)The experimental design based on global feature fusion is completed.Two fusion methods,average weighted fusion and Softmax weighted fusion,are proposed,while LGBM and Ada Boost are used to establish the speech emotion recognition model.The feature weight is determined according to the Pearson correlation coefficient between the predicted P,A,D values and the real P,A,D values.After the weighted fusion,the final P,A,D values are obtained.Since the effect of the two fusion methods is not obvious,Stacking model is introduced to fuse the global characteristics.Using LGBM and Ada Boost as the base learners of Stacking,the weight proportion of different base learners is obtained by training.Finally,the global characteristics trained by different base learners are fused by the meta learner.Experimental results show that the global characteristics fusion based on Stacking improves the recognition accuracy.(2)The experimental design based on stacking fusion of global and timesequence characteristics is completed.There are a lot of time-sequence information in these independent statements,that is,the feature information of each frame in the speech.In this paper,a deep learning model based on CNNLSTM-Attention mechanism is proposed to extract temporal features in speech and establish the mapping relationship between time-sequence features and speech emotion.At the same time CNN-LSTM-Attention is used as the third base learner of stacking,and the experiment of global and time-sequence characteristics is completed.The experimental results show that the method based on stacking fusion of global and time-sequence characteristics significantly improves the recognition accuracy.(3)The online classroom application based on speech emotion recognition is completed.After collecting the audio of online classroom,a high quality database of teachers’ s speeches is obtained by preprocessing and speech enhancement.The characteristics extraction,characteristics fusion and emotion recognition of these teachers’ speech data are carried out.Finally,the variation curve of teachers’ emotional pleasure can be obtained.The variation curve of emotional pleasure can help teachers to know emotional changes in time,and make appropriate teaching adjustments,while it can also be used as one of the indicators to evaluate the teaching quality of teachers.
Keywords/Search Tags:Speech Emotion Recognition, Feature Fusion, Stacking, CNN-LSTM-Attention, PAD 3-Dimensional Space, Teacher Emotion
PDF Full Text Request
Related items