Research On Speech Emotion Recognition Based On The Hierarchical Fusion Of Long And Short Memory Networks

Posted on:2022-12-05

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Zhang

Full Text:PDF

GTID:2480306761464404

Subject:Telecom Technology

Abstract/Summary:

PDF Full Text Request

Speech Emotion Recognition(SER)is an emotion recognition method based on human natural language,which is a key way to identify individual emotions in daily speech.SER uses the acoustic features of a speech fragment rather than the lexical features that have the semantic information of the fragment.So it identifies subjects' emotions by the "way" they speak,rather than the content of what they say.In intelligent human-computer interaction and human-computer interaction service,predicting the target speaker's emotional context can be an important factor in decision making,which is the key for computers to understand human emotions and the premise for human-computer interaction.In the field of speech emotion recognition,many features can express speech emotion.If the unique advantages of different speech emotion models are combined and the features are fused together,the recognition performance can be improved effectively.But in practice,traditional SER uses a simple series and parallel connection of two speech emotional features.This processing method directly leads to the increase of dimension of speech emotion feature after fusion,which increases the computational burden of the whole recognition process excessively.Thus,the space and time complexity of recognition system is invisibly increased.Later deep learning methods can learn nonlinear representations of valid speech signals at different input levels.It has been widely used in voice print recognition,speech recognition,emotion recognition and other fields.Deep neural network,convolutional neural network and cyclic neural network are commonly used in speech emotion recognition.However,the method based on deep learning can not fully mine local feature information and ignores the context coherence of global feature.Therefore,in order to highlight the signal characteristics of different tasks and solve the above problems in SER,this paper proposes a speech emotion recognition method based on hierarchical fusion of long and short memory network.The specific innovations are as follows:1)Four Conv LSTM blocks with dual channels are designed to extract local emotion features with hierarchical correlation.The Conv LSTM layer is used for input-state and state-state transitions,and the convolution operation is used to extract spatial cues.Conv LSTM focuses on key elements in speech fragments that can easily identify sequential sequences of speech signals,ensuring the predictive performance of speech emotion recognition frameworks.Residual learning strategies are used to extract temporal and spatial cues from hierarchical speech signals.2)In addition,a novel sequence learning strategy is used to extract global information,and the gated loop unit(GRU)is improved to adaptively adjust the relevant global feature weights according to the correlation of input features.After running the three-layer bidirectional GRU model,the attention mechanism is fed in to obtain significant features.Then operate the whole connection layer to get the judgment result of each emotion.3)Finally,the center loss function is used with softmax loss to generate probabilistic classification.The improved center loss increases the final classification result,ensures the accuracy of prediction,and plays a significant role in the whole speech emotion recognition scheme.The proposed method is tested on two standard interactive emotional speech,song audiovisual databases and Common Voice Chinese Voice data set,and the results show that the proposed method is effective.

Keywords/Search Tags:

Speech emotion recognition, Convolutional long and short term memory network, Hierarchical correlation, Gated recurrent unit

PDF Full Text Request

Related items

1	Research On Sea Surface Temperature Prediction Method Based On Deep Learning
2	Research And Implementation Of Emotion Recognition Technology Based On EEG
3	Emotion EEG Recognition Based On Convolutional Sparse Autoencoder
4	Total Electron Contents Predictions Over Equatorial And Low Latitude Regions Using GPS Data
5	Prediction Of DNA And RNA Binding Proteins Based On Machine Learning
6	Emotional Classification Based On EEG
7	Application Of Long Short-term Memory Network In Short-term Rainfall
8	A Precursor MicroRNA Identification Method Based On Convolutional And Long Short-Term Memory Networks
9	Research On Deep Learning Emotion Recognition Based On EEG Physiological Big Data
10	Research On Meteorological Prediction Based On Long Short-term Memory Network