Font Size: a A A

Research On Audio Sample Retrieval Technology Based On Contrastive Learning

Posted on:2023-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:C MaFull Text:PDF
GTID:2558306842955289Subject:Electronic Information (Computer Technology) (Professional Degree)
Abstract/Summary:PDF Full Text Request
Audio sample retrieval is a content-based audio retrieval method.Instead of using the audio text labels for retrieving,this method uses features extracted from audio samples that can reflect audio content.Therefore,it is widely used in many fields,such as music recognition,duplicate detection,and digital copyright protection.In recent years,with the rapid development of Internet technology,the data scale of audio sample retrieval system is constantly expanding,and the diversity of application scenarios also leads to the uneven quality of query audio data due to the differences in acquisition channels and recording environment,which may have large noise.Audio sample retrieval technology faces new challenges in retrieval speed and robustness.Focus on audio sample retrieval method,the main research work in this dissertation is as follows:(1)For the problem of robustness in audio sample retrieval,this dissertation proposes a contrastive loss function based on gradient harmonizing called GHM-InfoNCE,which is used to train a model to extract audio fingerprint.With the ability to distinguish features as the optimization goal,contrastive learning approach is used to learn feature representation with good discrimination ability from large-scale audio data as audio fingerprint,and introduces common noise distortions in the model training process to improve the robustness of the audio fingerprint.For the problem of insufficient learning of difficult examples in the model training process,a contrastive loss function GHM-InfoNCE based on gradient harmonizing is proposed,it uses the gradient norm to measure the difficulty of the samples to the model,and dynamically adjusts the contribution of samples to the model update according to the number distribution of simple samples and difficult samples.Experimental results show that it can effectively improve the performance of audio fingerprint.(2)For the fast retrieval of large-scale audio database,this dissertation proposes a multi-level filtering verifying audio retrieval method.We construct indexes by random projection local sensitive hash and self-similarity matrix from two perspectives: sub-fingerprint content and inter-segment relationships between sub-fingerprints.Then most of the samples are quickly filtered out and the exact results are finally verified using sub-fingerprints.Experimental results show that the method can effectively improve the retrieval speed while maintaining the retrieval accuracy.
Keywords/Search Tags:Audio retrieval, Contrastive learning, Audio fingerprint, Multi-level filtering
PDF Full Text Request
Related items