Font Size: a A A

Research On Polyphonic Multi-Instrument Recognition Method Based On Dilated Convolutional Recurrent Neural Network

Posted on:2024-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:W B PeiFull Text:PDF
GTID:2555307142966229Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of digital music,people’s demand for music information retrieval technology(MIR)is also increasing,and musical instrument recognition,as one of the key means to obtain advanced music information,has become an important research branch and research hotspot in the field of MIR.At present,most of the research work on instrument recognition is aimed at the identification of single musical instruments and the main instruments in polymusical instrument,but due to the limitations of musical variability and the complexity of musical instrument music signals,there is not much work on multi-instrument recognition in polymusical instrument.The application of convolutional recurrent neural network(CRNN)has promoted the identification of multiple instruments in polymusical instrument.At the same time,the emergence of dilated convolution neural network(DCNN)also provides new ideas for the study of musical instrument recognition.Taking typical Mongolian musical instruments as an example,this thesis constructs a dilated convolutional recurrent neural network model,and proposes a polyphonic multi-instrument recognition method based on dilated convolutional recurrent neural network.The main work includes:(1)Construction of Mongolian typical musical instrument dataset.In the professional recording studio,10 typical Mongolian musical instruments,including horse head qin,tenor horse head qin,big horse head qin,Tuva sanxian,Tobu Xiuer,Sanxian,Sihu,Humai,bamboo flute and Yatoga,were recorded,which provided a data basis for studying the recognition effect of the dilated convolutional recurrent neural network model on typical Mongolian musical instruments.(2)Improve the convolution neural network(CNN)model.Firstly,the Mel spectrogram that is more in line with the nonlinear auditory characteristics of humans is extracted as input features,and input into the CNN structure to obtain features that can accurately express the timbre information of musical instruments.However,convolution and pooling operations are generally added to the CNN structure to increase the receptive field,and these operations lose important features in the Mel spectrogram.Compared with CNN,the biggest advantage of dilated convolutional neural network is to ensure that the size and resolution of Mel spectrogram do not change while expanding the receptive field,so as to ensure that the important features in the spectrogram are not lost,and the spectrogram can be learned from a global perspective to extract more spatial dimension features.However,the correlation between the information obtained by dilated convolution neural network is weak,so in order to obtain better musical instrument timbre characteristics,the fusion of dilated convolution neural network and CNN is used to make up for the loss of important features of CNN and the irrelevant information obtained by dilated convolution neural network.(3)Fusion of CNN+ dilated convolution neural network and Bi-directional Long Short-Term Memory(Bi-LSTM)models.Since CNN ignores the problem of advanced temporal features of audio data in the process of extracting features,Bi-LSTM can effectively capture the temporal nature from a global perspective.In order to solve this problem,we will stitch the multi-layer convolution and the output information after pooling after the fusion of CNN and dilated convolution neural network,and then input it into the Bi-LSTM loop layer,that is,the dilated convolutional recurrent neural network is formed,and the dilated convolutional recurrent neural network can effectively capture the advantages of temporal series,fully extract the time series information therein,so that the model can learn in the time-varying spectrum.Finally,by extracting the harmonic energy spectra of eight typical Mongolian musical instruments,the harmonic energy distribution of each instrument was analyzed and compared,and then the reasons for the identification effect of some polymusical instrument categories were analyzed from the perspective of harmonic distribution.
Keywords/Search Tags:convolutional neural network, dilated convolutional neural networks, bi-directional long-term short-term memory network, Chinese folk music, Instrument recognition
PDF Full Text Request
Related items