| Emotional speech recognition has become an important research topic of human-computer interaction. The computer can make it easier to understand the human behavior and expand the possibility of human-computer interaction to get the perception of emotional states. It is processed and analyzed through the speaker’s voice signal by computers to determine the emotional state of the speaker. In view of the importance of emotional communication contributing to social relationships, emotional speech recognition has been widely used in the field of human computer interaction, emotional computing, psychological and mental disorders, and so on. Due to the multi-modal and spontaneous characteristics of speech, and the lag of the emotion theory, mature emotion recognition system has not been established. Combined with the current situation of emotional speech recognition and the actual needs, three parts of emotional speech recognition systems were introduced, namely, emotional speech database, feature extraction and recognition network. Principally, the emotional speech feature extraction section was researched and discussed in detail, and the new nonlinear features based on speech chaotic characteristics were proposed. The experiments show that the nonlinear features extracted in this dissertation can make up for the shortcomings of the previous features. The main contents include the following sections.(1)Firstly the basic knowledge of emotional speech recognition system was introduced,including the emotional speech database, the emotion characteristics and the recognition network. Then the chaotic characteristics which reflected the nonlinear process of the speech utterance were discussed by using the nonlinear dynamics theory to analyze the emotional speech. Three methods were used to verify the chaotic characteristics of emotional speech, including the power spectrum analysis, principal component analysis and phase space reconstruction.(2)Based on the chaotic characteristics of emotional speech signal, time series analysis method was used to realize the state space reconstruction of the emotional speech and make preparation for further research on extracting nonlinear features based on chaotic characteristics of emotional speech. To achieve phase space reconstruction of one-dimension emotional speech, the adjacent error method was used to calculate the parameter of embedding dimension and the average mutual information method was used to calculate the parameter of time delay. Besides, the C-C method was introduced to compute the embedding dimension and time delay parameters. Phase space reconstruction of a one-dimension emotional speech made conditions for nonlinear analysis of continuous emotional speech signal.(3) The method which applied nonlinear dynamic model to emotional speech signal processing was proposed to extract new nonlinear features: the minimum delay time, correlation dimension, Kolmogorov entropy, the largest Lyapunov exponent and Hurst exponent. Besides, qualitative analysis of the relevance of the above nonlinear features and the ability to distinguish emotions were done and identified that the nonlinear features based on emotional speech chaotic characteristic can be used as new features of emotion distinction.(4) Different emotion speech recognition experiments were designed and the conclusions were obtained. The experiments were based on the public Berlin speech database and the discrete emotional corpus TYUT2.0. Firstly, emotional speech features including prosodic features and MFCCs of different emotional states(happy, sad, angry, neutral and surprise) were extracted and qualitatively analyzed. Secondly, comparative analysis on the recognition rates of nonlinear features, MFCCs and prosodic features was done and verified that the nonlinear characteristics can distinguish different emotions effectively. At last, the recognition network was used to identify the performance of different fusion features which were constituted by the above three types of features respectively. The results verified that the performance of recognition network can be improved by fusion with nonlinear features. |