It is an important part of human-computer interaction to entrust computers with human emotions and enable computers to express emotions like human beings.Speech is the main way of emotional expression.Emotion recognition based on speech signal has been widely concerned.More questions have been raised with the deepening of the research on speech emotion recognition.Among them,the contextual information between contexts contains a large amount of emotion-related information.Obviously,the use of emotional correlation between contexts can improve the effect of emotion recognition.Based on this,this paper puts forward two different methods of using contextual information and combines the two methods to jointly apply to speech emotion recognition.Firstly,a speech emotion recognition method based on context features is proposed.This method extracts emotion features,namely context features,from historical statements through the IFCN_LSTM cascade network,refines the precision of spatial features and relates the sequence of time sequences,and then splines the context features with the current target statements.Identify emotional states with statements that contain contextual features.To extract context features,the IFCN network extracts spatial structure features,removes redundant features,refines spatial precision,and ensures that it has the same time sequence as the input state.LSTM network is used to calculate the time-dependent relationship of time-domain features,correlate the correlation between emotion features of time series,and calculate the sequence of emotional changes.By extracting the context features of the historical statements,the historical emotion state is transferred to the target statement,which can improve the recognition accuracy by using the context features by associating the context features highlighted by the emotions in the historical statements.Secondly,a context feature fusion method based on the attention mechanism is proposed.The attention mechanism is used to calculate the weight of the context statement in each period of the current target statement and describe the correlation degree between the historical emotional state and the current emotional state so that the emotion-related information in the context statement can be learned and integrated into the current target statement so that the network can focus more on the emotionally prominent part of the current statement.Context features contain more fine granularity of emotion.This kind of attention fusion mechanism is applied to context feature fusion,and context features are embedded into the current target statement to depict a more accurate emotion correlation between the current emotion state and historical emotion state,and improve the overall effect of emotion recognition. |