Font Size: a A A

Research On Music Emotion Classification Based On CNN-LSTM

Posted on:2021-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:C F ChenFull Text:PDF
GTID:2415330605482446Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Music is rich in information about human emotions and the study of music emotion classification(MEC)is helpful for the organization and retrieval of massive music data.Based on the duration and complexity of music audio,its characteristic parameters tend to be multi-dimensional,of large quantities and difficult to be analyzed.Existing music emotion classification research focuses on single-modal analysis of audio or lyrics,ignoring the correlation between modalities,and there is a certain degree of information loss.Based on the above problems,by building a multi-modal music emotion classification system and fusing audio and lyrics modal information,the classification performance can be effectively improved.For audio classification,the real music audio is finely divided,and pure background sound fragments are obtained through human voice separation,which has better classification performance than the original audio.The spectrogram and LLDs features are extracted from it,and the use of two audio features can improve the classification effect and make up for the lack of a single feature.For the classification of lyrics,three vector space models,TF-IDF,chi-square test and improved chi-square test,and the word embedding model extracted by Word2 vec are used as the feature representation of the lyrics text,which are verified by the SVM classifier.The chisquare test method optimizes the size of the parameters considering the particularity of the semantics of the lyrics,and the emotion classification effect is outstanding.Aiming at the limitation of single feature and the defects of single network classification method,this paper constructs audio and lyrics classifier by using the feature extraction ability of convolutional neural network and the ability of recurrent neural network to process serialized data,and adding attention mechanism.Singlemodal classification model based on CNN-LSTM,used for sentiment classification output.The architecture of the combined network was applied in the field of music sentiment classification,and the network was improved.It can accept two types of feature data input and improve the classification accuracy.Compared with the traditional SVM,CNN and LSTM classification methods,this model has greatly improved the performance of audio and lyrics sentiment classification.Among them,the accuracy of audio reaches 68%,and the accuracy of lyrics classification reaches 74%.In order to fuse modal emotion information,this paper constructs three types of multi-modal music emotion classification systems.In the feature-level fusion representation,the CNN-LSTM single-modal classification model is used to unify the modal vectors,which avoids the classification defects caused by the direct dimensionality reduction method.In the decision-level fusion representation,an improved Thayer decision fusion method is proposed,which can make good use of the correlation between modalities.In view of the heterogeneity of different modal features,this paper proposes a multi-modal ensemble learning method based on Stacking to obtain the best performance,and the average classification accuracy rate is 78%,compared with single mode,it has a great improvement and has good scalability.
Keywords/Search Tags:Music Emotion Classification, Multi-modal, CNN-LSTM, Stacking, Spectrogram, Word Embedding
PDF Full Text Request
Related items