Speech Emotion Recognition Based On Graph Convolution Neural Network

Posted on:2024-04-07

Degree:Master

Type:Thesis

Country:China

Candidate:F F Xu

Full Text:PDF

GTID:2558307136992949

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

Speech Emotion Recognition(SER)is a technology that studies the emotional information in human speech.It identifies and understands the emotional state of the speaker by analyzing acoustic features and emotional information in the speech signal.Previous research has mainly focused on traditional machine learning methods,convolutional neural networks(CNNs),and their variants.However,issues such as diverse expression capabilities,variable emotion categories,and differences in the length of speech samples have made it challenging for manually designed features to cover most of the emotional information in the samples.In recent years,with the rise of graph model networks,Graph Convolutional Neural Networks(GCN)have shown excellent performance in the field of deep learning.Therefore,this paper first studies the deep feature extraction of speech samples,then introduces GCN to handle speech emotion recognition tasks,and finally improves the adjacency matrix.The main research work is as follows:(1)To address the problem that the extracted emotional features from speech samples cannot cover most of the emotional information and to capture the temporal information of speech samples,this paper proposes a speech emotion recognition method based on Bidirectional Long Short-Term Memory and Graph Convolutional Neural Networks(BLSTM-GCN).The method first uses the Open SMILE toolkit to extract frame-level speech emotion features.Then,Bidirectional Long ShortTerm Memory(Bi-LSTM)is used to further extract deep frame-level emotion features.The extracted frame-level emotion features are divided into two paths for subsequent networks.Next,the framelevel deep emotion feature vectors are constructed into a graph structure,and GCN is used to train the features.The speech samples are globally modeled using sumpooling.Finally,softmax function is used for prediction and classification.This method first extracts emotional features from speech samples at different levels,enhancing feature representation.Then,GCN is introduced as a baseline network for speech emotion recognition tasks,replacing common CNNs and their variants.GCN can effectively optimize node features by utilizing the topological structure between node features.Experimental results demonstrate that this method achieves weighted accuracies of 66.04% and 57.5%on the IEMOCAP and MSP-IMPROV databases,respectively.(2)To address the limitation of predefined adjacency matrices in GCN for node information interaction,this paper proposes the use of decayed connection adjacency matrices and adaptive adjacency matrices in GCN.The decayed connection adjacency matrix method adds the predefined adjacency matrix element-wise with a relation matrix,and then applies a hyperparameter to the resulting matrix.The adaptive adjacency matrix first initializes a learnable node embedding matrix randomly.Then,it infers spatial dependency relationships between node pairs through the calculation of node similarity.Finally,during the model training process,it calculates the loss based on the objective function,updates the node embedding matrix through an optimizer,and gradually approaches the optimal value from the random initial value.Compared to predefined adjacency matrices,the two proposed adjacency matrices enhance information interaction between nodes.Experimental results show that the proposed methods achieve weighted accuracies of 66.82% and58.35% on the IEMOCAP and MSP-IMPROV databases,respectively.

Keywords/Search Tags:

Speech Emotion Recognition, Deep Learning, Feature Extraction, Graph Convolutional Neural Networks, Adjacency Matrix

PDF Full Text Request

Related items

1	Research And Implementation Of Speech Emotion Recognition Based On Convolutional Neural Networks
2	Dual Fusion Speech Emotion Recognition Based On Deep Learning
3	The Research Of Speech Emotion Recognition Based On CNNs
4	Deep Learning Models For Speech Emotion Recognition
5	Speech Emotion Recognition Based On Deep Learning
6	Speech Emotion Recognition Based On Deep Learning Technology
7	Research And Application Of Speech Emotion Recognition Algorithm Based On Deep Learning
8	Research On Speech Emotion Recognition Technology Based On Deep Learning
9	Research On Feature Fusion Method Of Speech Emotion Recognition Based On Deep Learning
10	Design And Implementation Of Speech Emotion Recognition Algorithm Based On Deep Learning