Font Size: a A A

A Feature Fusion Method Based On Multi-level Local Feature Coding And Its Application To Music Genre Recognition

Posted on:2021-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:W J ZengFull Text:PDF
GTID:2415330611466948Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of digital music and online music services,music information retrieval(MIR)has become an important field of research.In particular,music Genre recognition(MGR)is an important branch of MIR because MGR plays a fundamental role in the context of music indexing and retrieval for music engines of websites.Increasing the accuracy of automatic music genre recognition is a cornerstone toward the deployment of robust music information retrieval systems.In the task of MGR,most existing machine learning methods usually consist of feature extraction and machine learning phase.The feature extraction phase suffers from loss of information or insufficient features while the machine learning phase depends heavily on feature extraction phase and cannot make full use of information.Some handcrafted features designed by domain experts often lack versatility and cannot be migrated to other tasks.With the extensive use of deep learning in other fields,many MGR solutions have shifted towards deep learning.There are some limitations in existing methods for MRG.Firstly,MGR is different from image classification because the genre has complex intrinsic patterns that are highly diverse and have different levels of abstractions.Most deep learning methods are limited to the learning of global features but not local features at different levels of abstractions and their dependencies in MGR.Secondly,some methods use only a single feature and ignore the complementarity between different features,which fail to provide enough discriminative information.Finally,some ensemble learning methods used for MGR only fuse features at the decision-level when combining the advantages of various features.This may ignore the mutual relationship of features at the early stage.This thesis proposes an effective feature fusion method based on multi-level local feature coding for MGR.In the proposed method,we design a feature encoding network to capture local information at different levels of abstractions in music stream and learn their dependencies inspired by both Net VLAD and the self-attention mechanism.This is because the music genres are usually distributed on different levels or time scales of music streams.The complementary nature of the scatter transform feature and the transfer feature relative to typical features is also considered for the MGR,which enriches the diversity of features tomake the proposed method learning more useful representations.The transfer feature is trained on a source task for a target task aiming to transfer knowledge between two domains.On the other hand,the scatter transform feature defines a translation-invariant representation which is stable to time-warping deformation.Moreover,in the final ensemble,we use a meta-CNN to learn the mutual relationship among different features at early stage instead of combining hard decisions from independently classifiers.In this thesis,several experiments have been carried out on the GTZAN,ISMIR2004 and Extended Ballroom datasets,respectively.Testing accuracies on GTZAN,ISMIR2004 and Extended Ballroom datasets achieve 96.50%,92.46% and 95.50%,respectively,which is higher than that of other state-of-the-art methods'.This proves the validity and advancement of the proposed method.
Keywords/Search Tags:Music Genre Recognition, NetVLAD, Self-attention, Convolutional Neural Network
PDF Full Text Request
Related items