| Vocal melody extraction from polyphonic music has a wide range of applications in music content analysis,copyright protection,and query by humming recognition,etc.It is an important research topic in the field of music information retrieval.Research institutions and industry researchers have conducted in-depth research on this topic and achieved phased results.however,there are still some problems need to be addressed further.This thesis conducted research on the algorithm for extracting the vocal melody from the polyphonic music based on the harmonic and temporal continuity characteristics of musical sound signals and utilizing theories and technologies such as digital signal processing,graph convolutional neural networks,recurrent neural networks,and statistics,the main work carried out in this thesis is listed below:1)Graph modeling for vocal melody extraction from polyphonic music is proposedThe proposed model considers the problem of excessive redundancy and poor interpretability in general deep learning models,to model music signal spectra by constructing an undirected graph starting from the characteristic structural features of audio signals.in this case,the nodes of the undirected graph represent nodes in the frequency spectrum,while the edges of the undirected graph represent the correlation between frequency points,which is proposed here for the first time.Then,using the characteristic of the logarithmic frequency scale spectrum that the fundamental wave and each harmonic have the same relative position at different pitches,a graph structure with translation invariance is constructed.Finally,based on the defined graph structure,a graph convolutional neural network is used to estimate the melody pitch frame by frame.The adjacency matrix defined in this thesis reflects the potential connection between the fundamental and harmonic waves,making the algorithm interpretable.At the same time,the algorithm proposed in this thesis achieves good performance with very few parameters,confirming the rationality and superiority of the proposed algorithm.2)Melody extraction algorithm that combines graph modeling with accompaniment suppression is proposedIn polyphonic music,a serious problem occurs when the instrument accompaniment and vocals are superimposed in both time and frequency domains,leading to significant interference of the instrumental accompaniment with the extracted melody of the vocal.A melody extraction algorithm that combines graph modeling with accompaniment suppression is proposed.The aim is to improve the effect of extracting the vocal melody by suppressing the accompaniment.Specifically,the vocal melody extraction is divided into two steps: first,the accompaniment suppression section removes the accompaniment components in multi-pitch music,thus strengthening the vocal components in the mixed audio signal spectrum;then,graph modeling is used to extract the melody from the preprocessed musical signal.The accompaniment suppression performed before melody extraction improves the melody-accompaniment ratio of the mixed audio signal,and the subsequent graph modeling method can achieve better vocal melody extraction performance with low parameter count.Experimental results show that the combination of accompaniment suppression and melody extraction algorithm can improve the accuracy of melody extraction.3)Temporal harmonic-graph convolutional network for vocal melody extraction is proposedThe results of melody extraction based on graph modeling for song voice often have abrupt values and quantization errors that make song melody appear as a staircase.In this thesis,starting from the harmonic correlation and temporal continuity of music signal spectra,we proposed a melody extraction algorithm based on temporal harmonic graph convolutional network to enhance the ability of extracting harmonic and temporal information from the spectrum.This method uses graph modeling to map input spectra to melodic pitch sequences and introduces GRU modeling for temporal continuity.It also uses temporal harmonic graph convolutional networks to model both harmonic correlation and temporal continuity.In addition,this method fine-tunes the song melody pitch sequence by constructing a fine-grained saliency function and obtains a smooth song melody trajectory under the premise of higher accuracy.Experimental results show that this method outperforms other reference algorithms significantly. |