Font Size: a A A

The Speech Emotion Recognition Research Based On Speech Spectrogram And Convolutional Neural Network

Posted on:2021-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:C DengFull Text:PDF
GTID:2518306113462004Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As time progressed,the research on recognition of human emotion has become very hot.Common emotion recognition methods are based on facial expressions,text contents,physiological signals,and human speeches.As one of the most important communication methods in daily life,the voice also contains a lot of the speaker’s emotional information.In recent years,the human-computer interaction becomes prosperous in various fields,such as the financial industry,tourism,distance education and criminal detection.There lies increasing need for using artificial intelligence to understand and distinguish human emotions.So the acquisition and utilization of voice for emotion detection can not only promote the communication between humans,but also promote the communication between humans and computers.Therefore,the speech emotion recognition has great significance and broad application.The former research on speech emotion recognition focused on acoustic research and statistical analysis.So many researches tried to find new acoustic features or combine existing acoustic features.However,acoustic features generally need to be manually extracted which requires a certain basis of acoustics and lots of experimental attempts.What’s more,it is difficult to dig deep into features that people have no conscious of.On the other hands,the speech spectrogram hides a large amount of information in the form of time,frequency and energy.So research on speech spectrogram has become a new hotspot of speech emotion recognition recently.Convolutional Neural Network(CNN)as a self-learning method has performed well in image recognition by deeply digging into the feature information of images for classification.Therefore,this paper chooses the CNN model to study emotion from the speech spectrogram.The main results of this paper are as follows:(1)We extracted the acoustic features of the public part of the CASIA Chinese Speech Emotion Database,which comes from INTERSPEECH 2009 Emotion Recognition Challenge.We used KNN and SVM methods for speech emotion recognition respectively,and got a good recognition performance.It is used as reference group for following research.(2)We drew the speech spectrogram on the same database after preprocessing steps,and used CNN method to implement emotion recognition.Compared with the traditional KNN and SVM methods based on acoustic features,this method got a worse accuracy.However,it still verified the feasibility of CNN method based on spectrogram.(3)Aiming at the shortcomings of the small number of samples in the original data set,the offline image augmentation method is used to simulate the addition of speech samples to expand the original speech set,and apparently improve the recognition results of CNN method based on spectrogram.Based on this,a variety of data augmentation combinations are used to further explore and optimize the performance of the model,effectively avoiding the problem of data sparseness common in speech emotion recognition situation.(4)According to the repeatedly learning image features in the Dense Block,the Dense Block structure is used to optimize and improve the CNN model while the data set is unchanged,and it is proved by experiments that the accuracy of the modified model is improved beyond the original one to some degree.Data augmentation and model optimization are tentatively combined to deeply explore and study how to improve the performance of speech recognition.(5)Considering image features and acoustic features of the spectrogram,the research on the multi-model fusion is conducted to these two different types of complementary features through middle-layer feature stitching and decision-layer voting,which effectively improves the speech emotion recognition rate.
Keywords/Search Tags:speech emotion recognition, speech spectrogram, CNN, offline image augmentation, Dense Block, multi-model fusion
PDF Full Text Request
Related items