| Emotion recognition technology is a cross-cutting research technology that hopes that machines can understand certain emotional states of human beings and help them accomplish certain purposes involving multiple fields and disciplines.Thanks to the continuous updating of technology and the iteration of data acquisition devices,human-machine interaction based on emotion recognition has been widely used in many fields such as virtual reality,assisted driving,game development and medical diagnosis.The signal modalities used for emotion recognition can be divided into two categories:(1)non-physiological signals including external signals that are closely related to or result from emotions such as facial expressions(microexpressions),eye movements,speech and posture;(2)physiological signals including EEG,ECG and internal signals that are not controlled by subjective consciousness such as blood volume pulses and heart rate variability.Extrinsic signals such as facial expressions,speech,and posture are readily available using low-cost devices,yet studies have found that these extrinsic signals can easily be misleadingly generated by individuals to hide their true emotions.Eye movement signals,which include gaze,sweep,and blink and pupil diameter information,are expensive to obtain but observe user behavior in a natural way and are a non-invasive and accurate source of emotion research data that can be embedded into wearable devices such as virtual reality.Among multiple physiological signals EEG signals are widely used in emotion recognition research due to their advantages of responding to real human emotions and not being deliberately hidden as well as the rapid development of EEG signal acquisition technology.In this paper,we use two modalities,EEG signal and eye movement signal,to investigate the intrinsic connection between EEG signal and emotion based on theories related to machine learning and deep learning,and then theoretically verify that EEG signal contains nonlinear features that can be used for emotion recognition and classification.In order to further improve the accuracy of EEG signals for emotion recognition,a novel manual feature is proposed.Based on convolutional neural network and capsule network,a C-Caps Net emotion recognition network architecture is proposed to extract the depth features of EEG and eye movement signals to achieve emotion recognition by heterogeneous EEG and eye movement signals,which solves the problem of high accuracy of manual features but low generalization ability.The details of the research are as follows.(1)To address the temporal nonlinearity of EEG signals and the Hurst entropy reflecting the long-term memory of nonlinear systems,an adaptive Hurst exponential variational modal decomposition method is proposed based on the strong robustness of the variational modal decomposition method to achieve EEG signal noise reduction and EEG signal nonlinear feature extraction,and to restore the dynamical structure of the intrinsic connection between EEG signals and emotions using the phase space reconstruction theory,based on the DEAP Theoretically validate the intrinsic connection between EEG signal and emotion nonlinear features based on DEAP data set.(2)To address the problem that it is difficult to continue to improve the accuracy of EEG signal single features for emotion recognition and multimodal fusion ignores the correlation between features and emotions,a deep convolutional alignment entropy is proposed as a new nonlinear manual feature of EEG signal based on multi-scale feature extraction and deep convolution factor and nonlinear alignment entropy.At the same time,to address the problem that the dimensionality of the nonlinear feature vector for constructing the deep convolutional alignment entropy of EEG signal is too high,the local neighborhood construction of emotional state distance is redefined,and the linear local tangent space alignment dimensionality reduction algorithm is improved.Finally,a nonlinear manual feature extraction method of EEG signal based on deep convolutional alignment entropy and improved linear local tangent spatial alignment dimensionality reduction algorithm is proposed to further improve the manual feature emotion recognition accuracy of EEG signal based on DEAP dataset.(3)To address the problems of high accuracy of manual feature extraction but low generalization ability and unreliable emotion recognition results using one modality,a dualmodality emotion recognition model C-Caps Net is proposed based on convolutional neural network and capsule network for extracting the depth features of EEG and eye movement signals to achieve emotion recognition by making full use of spatial and temporal information of EEG signals to heterogeneous EEG and eye movement modalities.C-Caps Net is used to extract the depth features of EEG and eye movement signals for emotion recognition.Firstly,the two modalities of EEG signal and eye movement signal are acquired and pre-processed,and the spatio-temporal features of EEG signal and eye movement signal are extracted while removing the noise of the two modalities.Convolutional neural network is the most commonly used classification method because the convolutional layer is very effective in handling data patterns with different temporal locations,but the data routing process and location information as well as pose information are lost at the same time.Therefore,the underlying information of EEG and eye movement signals are first extracted using convolutional neural networks,and the features are fused and reisolated to form an underlying capsule containing EEG and eye movement signal features,and finally all underlying capsules jointly determine the emotion capsule to obtain emotion recognition classification results,and the emotion recognition is verified using SEED IV multimodal dataset based on 5-fold and LOO cross-validation methods,respectively.performance. |