Font Size: a A A

Research On Content-based Automatic Music Annotation Methods

Posted on:2020-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q WangFull Text:PDF
GTID:2428330575958027Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technologies and various multimedia ap-plications,digital music corpora have reached a massive scale and are constantly being enriched with new content.As an effective measure for efficient utilization of massive music data,automatic music annotation technology assigns one music clip a set of rele-vant descriptive tags based on music's content,which are of significant values to music related research and applications such as music retrieval,classification,recommenda-tion,etc.The tags generated by a music annotation algorithm usually capture various attributes of the music like emotion,genre and instrument,and provide high-level se-mantic descriptions for the music.However,due to the large semantic gap between low-level acoustic features and high-level semantic tags,designing and implementing an effective automatic music annotation method is faced with a number of difficul-ties,such as appropriate feature representation and effective modeling of feature-to-tag mapping and tag-to-tag correlation.Accordingly,music annotation has attracted a lot of research attention over the past decades,and a number of effective approaches and promising results have been reported.This thesis conducts in-depth research on effec-tive models and algorithms for automatic music annotation,and proposes two effective music annotation methods.Inspired by the observation that quite a number of tags assigned to one music piece correspond to some local regions of the music instead of the whole music,this thesis proposes a music annotation method based on a conditional random field(CRF)model on music segments.The method integrates feature-to-tag correspondence,tag smoothness and local-to-global annotation consistency in the conditional random field model with label-specific feature learning.For one music piece to be annotated,the relevant tags of every local segment of the music are first inferred using the CRF models corresponding to respective tags.Then,these local annotations are aggregated to obtain the holistic annotation of the music.Based on the popular deep neural networks,this thesis proposes a music anno-tation model by hierarchically combining convolutional neural networks(CNN)and recurrent neural networks(RNN)as well as exploiting multiple music representations as input.The proposed model first integrates multiple gated linear units(GLUs)in at-tentive convolutional network models to learn effective representations from both 1-D raw waveform signals and 2-D Mel-spectrogram of the music,which better captures informative characteristics of the music for the annotation task than exploiting any sin-gle representation channel.Then the bidirectional Long Short-Term Memory(LSTM)model is exploited to depict time-varying hierarchical structures embedded in the de-scription sequences of two music representation channels respectively,and further a dual-state LSTM structure is introduced to encode temporal correlations between two representation channels,which effectively enriches the music descriptions.Finally,music descriptions generated at every time step are aggregated by a self-attentive multi-weighting mechanism for music tag prediction.To evaluate the effectiveness of the proposed methods,this thesis conducts inten-sive experiments on public music datasets CAL500 and MagnaTagATune.The experi-ment results show that,compared to existing approaches,the proposed music annotation methods achieve better annotation performances,which demonstrate their effectiveness in automatic music annotation.
Keywords/Search Tags:Music Annotation, Multi-Label Classification, Conditional Random Field, Deep Neural Network, Self-Attention
PDF Full Text Request
Related items