Font Size: a A A

Research On Deep Learning Algorithm For Speech And EEG Emotion Rec?gnition

Posted on:2021-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:F GaoFull Text:PDF
GTID:2480306113950029Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Speech emotion recognition and deep learning are the important research direction and important research method in the field of artificial intelligence.As a typical physiological signal,electroencephalogram(EEG)signals are the best reflection of people's cognitive ability of emotion.Therefore,it is of great significance and theoretical value for deep learning algorithm of speech and EEG emotion recognition.The research subject of this paper,with the characteristics of interdisciplinary,mainly collects the EEG signals of subjects in the state of listening to emotional speech,introduces the deep learning models and proposes several improved algorithms to analyze two kinds of emotional carriers(speech and EEG).The main research content of the paper is summarized as follows:Based on the signal processing method,combined with the theories of feature extraction,data analysis and deep learning,the paper analyzes the emotional EEG signal and emotional speech signal.Firstly,emotional speech data is selected;secondly,emotional EEG data is induced by TYUT2.0 emotional speech data;thirdly,multi-class features are extracted from two-dimensional image sequence and one-dimensional time sequence for two kinds of signals,and the principal component analysis in data analysis is used for data preprocessing;finally,the emotional cognitive research of EEG-assisted speech signal is realized with the help of deep learning methods.Traditional emotional speech features lack of effective emotional information connecting time-frequency domain.Therefore,the Mel spectral generated by emotional speech signal is considered as a feature,which is analyzed from the perspective of two-dimensional image sequence.The improved convolutional neural network(CNN)is selected to realize the depth extraction of features,and the optimization of the operation process is combined with the graphics processing unit(GPU).Finally,linear kernel support vector machine(SVM)is introduced for emotion recognition.The CNN is optimized by batch normalition(BN)and global average pooling(GAP),which not only enhances the learning ability of the model,but also improves the disadvantages of high dimension of traditional manual features and inadequate expression of emotional information.Instead of the original classification layer,linear kernel SVM can effectively save computing resources and shorten training time.It is impossible to fully express emotional information by using a single feature,while the linear superposition of multi-class features ignores the correlation between different features.Therefore,it is considered to fuse multiple time-frequency domain features of emotional speech signal,which is analyzed from the perspective of one-dimensional time sequence.The restricted boltzmann machine(RBM)with multi-layer stacking is selected for feature reconstruction.Finally,long short term memory(LSTM)is introduced to realize emotion classification of new feature.Deep-restricted boltzmann machine(DBM)can not only effectively improve the linear ircorrelation of "additive" features,but also avoid the decrease of recognition rate caused by high feature dimension.The LSTM has a good learning ability for the reconstructed feature that still maintain high time resolution characteristic,and can fully learn the inner emotional information between the features.Single-class emotion analysis of speech signal ignores the influence of key physiological signals.Therefore,EEG is considered as an auxiliary signal to extract multi-class features of emotional speech and EEG,which is analyzed from the perspective of one-dimensional time sequence.The RBM with multi-layer stacking is still selected for feature reconstruction.Finally,the improved LSTM is introduced to realize emotion classification of multi-signal features.It increases the difference between different emotion categories with the help of multi-label classification cost function,which effectively reduces the misclassification of categories.The effectiveness of EEG and new cost function in improving the performance of emotion recognition is verified by experiments.
Keywords/Search Tags:Speech, Electroencephalogram, Emotion Recognition, Deep Learning, Multi-label Classification Cost Function
PDF Full Text Request
Related items