| As one of the most important ways of daily communications for people,speech contains a lot of emotional information.With the development of artificial intelligence and the deeper technological research in recent years,human-computer interaction has become a hot research topic.The ability of machines to recognize and express emotions as human beings becomes the goal of researchers,so the speech emotion recognition become more and more important.Speech emotion recognition is one of the challenging subjects in the field of speech processing and its application is very extensive.Therefore,this research has great theoretical significance and application prospect.At present,in the aspect of speech feature extraction,most researchers choose prosodic features,sound quality features,or spectrum-based features.However,the research of combining time domain and frequency domain is relatively few,the spectrogram has the characteristics of combining the time domain information with the frequency domain information and itself contains a large amount of information related to the speech.Therefore,we selects the spectrogram to extract the speech emotion features.The main contents of this thesis are as follows:1)The research background and significance of speech emotion recognition are discussed,and the research history and present situation of speech emotion recognition are summarized,then we studies emotion classification models and the commonly used speech emotion databases.2)The pre-processing of speech emotion data can improve the analysis accuracy.In this thesis,the pre-processing of speech includes pre-emphasis,framing and window adding and endpoint detection.After pre-processing,the pitch frequency,short-term energy of the speech signal,short-term zero-crossing rate,formants and Mel cepstrum coefficients are extracted to constitute the emotion feature vector.3)On the basis of a simple study of the development of artificial neural network,its basic model and classification,a typical multi-layer perceptron-BP network is used to perform experiments on speech emotion recognition,then the BP network is optimized by increasing the momentum term,the experimental results show that the recognition rate of the improved BP neural network is higher than that of the ordinary BP network.4)We studies a typical deep learning network structure,the convolutional neural network(CNN),and compare it with the traditional artificial neural network,then focusing on the basic principles and advantages of CNN.In this thesis,we put forward the research of speech emotion recognition based on the combination of speech spectrogram and CNN,and find out the best network model structure through experiment,then carry on the contrast experiment under different environment and different signal-to-noise ratio(SNR),and two different classifiers,softmax and SVM.In order to further verify the effectiveness of the algorithm proposed,experiment are carried out on different speech emotion database.The experimental results show that the recognition rate is greatly improved by the the combination of spectrogram and CNN,moreover,the effect of using SVM as classifier is better than that of softmax. |