Research On Music Emotion Classification Based On CNN-LSTM

Posted on:2021-02-25

Degree:Master

Type:Thesis

Country:China

Candidate:C F Chen

Full Text:PDF

GTID:2415330605482446

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Music is rich in information about human emotions and the study of music emotion classification(MEC)is helpful for the organization and retrieval of massive music data.Based on the duration and complexity of music audio,its characteristic parameters tend to be multi-dimensional,of large quantities and difficult to be analyzed.Existing music emotion classification research focuses on single-modal analysis of audio or lyrics,ignoring the correlation between modalities,and there is a certain degree of information loss.Based on the above problems,by building a multi-modal music emotion classification system and fusing audio and lyrics modal information,the classification performance can be effectively improved.For audio classification,the real music audio is finely divided,and pure background sound fragments are obtained through human voice separation,which has better classification performance than the original audio.The spectrogram and LLDs features are extracted from it,and the use of two audio features can improve the classification effect and make up for the lack of a single feature.For the classification of lyrics,three vector space models,TF-IDF,chi-square test and improved chi-square test,and the word embedding model extracted by Word2 vec are used as the feature representation of the lyrics text,which are verified by the SVM classifier.The chisquare test method optimizes the size of the parameters considering the particularity of the semantics of the lyrics,and the emotion classification effect is outstanding.Aiming at the limitation of single feature and the defects of single network classification method,this paper constructs audio and lyrics classifier by using the feature extraction ability of convolutional neural network and the ability of recurrent neural network to process serialized data,and adding attention mechanism.Singlemodal classification model based on CNN-LSTM,used for sentiment classification output.The architecture of the combined network was applied in the field of music sentiment classification,and the network was improved.It can accept two types of feature data input and improve the classification accuracy.Compared with the traditional SVM,CNN and LSTM classification methods,this model has greatly improved the performance of audio and lyrics sentiment classification.Among them,the accuracy of audio reaches 68%,and the accuracy of lyrics classification reaches 74%.In order to fuse modal emotion information,this paper constructs three types of multi-modal music emotion classification systems.In the feature-level fusion representation,the CNN-LSTM single-modal classification model is used to unify the modal vectors,which avoids the classification defects caused by the direct dimensionality reduction method.In the decision-level fusion representation,an improved Thayer decision fusion method is proposed,which can make good use of the correlation between modalities.In view of the heterogeneity of different modal features,this paper proposes a multi-modal ensemble learning method based on Stacking to obtain the best performance,and the average classification accuracy rate is 78%,compared with single mode,it has a great improvement and has good scalability.

Keywords/Search Tags:

Music Emotion Classification, Multi-modal, CNN-LSTM, Stacking, Spectrogram, Word Embedding

PDF Full Text Request

Related items

1	Research On Multi-modal Music Emotion Classification Based On Audio And Lyric
2	Research On Mongolian Music Emotion Recognition Based On Multi-modal Fusion
3	Research On Music Genre Classification Method Based On LSTM Model
4	Music Emotion Classification Based On Improved Extreme Learning Machine
5	Research On Music Genre Classification Model Based On Convolutional Neural Network
6	The Improvement Of Test Set And The Linguistic Evaluation Of Chinese Word Embedding
7	Music Sentiment Analysis Based On Multi-modal Information
8	The Minnan Yongchun Dialect Expression Word Research
9	Research On Net Literature Misleading Comment Filtering Technology
10	Research On Decompression Music Reconstruction Technology Based On Multi-channel Physiological Signals