Font Size: a A A

Music Detection And Generation Using Neural Networks

Posted on:2022-06-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:B J JiaFull Text:PDF
GTID:1485306551969949Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Computer science integrates musicology and promotes the automatic and intelligent development of musicology,giving birth to two major research directions:music information retrieval and music generation.Music detection is a key task in the field of music information retrieval.It focuses on how to detect and localize music events from audio.Music generation is an exploration of artificial intelligence in the aspect of artistic creation.It focuses on studying algorithms that allow computers to automatically compose music.As the amount of digital music is growing rapidly,neural network method become the mainstream method of music detection and music generation in recent years,the reason is that it is good at learning massive data and has strong modeling ability.Recent music detection neural network methods usually converts music detection problem into a problem similar to image classification or sequence labeling and then uses the Convolutional Neural Network or Recurrent Neural Network to solve the problem.However,the existing neural network methods ignore the subtask of music relative loudness estimation and the temporal characteristic of music data when solving the music detection task in music information retrieval.The current music generation neural network methods usually first represents music into image-like or language-like data format and then adopts models such as Variational Auto-Encoder or Transformer to model it.Nevertheless,the existing music generation neural network methods have not elegantly solved the discrete problems which is brought by the piano-roll represented music data,also limitation issue of the control domains and instrument types in the task of controllable music generation.This dissertation focuses on the above problems,and the innovative research achievements and main contributions obtained in this dissertation include:(1)A hierarchical regulated iterative neural network method is proposed for the joint task of music detection and music relative loudness estimation,which solves the problem of lack of temporality and hierarchy modeling in the joint task and significantly improves the performance of music detection and music relative loudness estimation in the segment-and event-level evaluations.Existing music detection work often ignores the subtask of music relative loudness estimation,so this dissertation mainly studies the joint task of music detection and music relative loudness estimation.However,the existing methods ignored to model the temporality and hierarchy.We reformulate the joint task as hierarchical event detection and localization problem,and propose Hierarchical Regulated Iterative Networks to solve this problem.Extensive experiments on the public dataset OpenBMAT show that the proposed HRIN performs better in the segment-level and event-level evaluations.(2)A coupled latent variable model with binary regularizer is proposed,which solves the distribution modeling problem in and between tracks of pop music,and also solves the discrete non-differentiable problem caused by symbolic music data represented by piano-rolls,and achieves better performance in both quantitative and manual evaluations.Pop music is the most popular music genre with the widest audience,so the research on the generation of pop music has certain application value.However,pop music usually contains multiple tracks,which makes the generation of pop music challenging.In addition,before using the neural network method to process music data,a common practice is to represent music in the form of image-like data.Different from the continuous image data,the image-like represented music data(i.e.piano-roll representation)is discrete and binary,which leads to a non-differentiable problem.In order to solve the above challenge and problem,a new neural network method is proposed,which is a coupled latent variable model with binary regularizer.Experiments are carried out on the Lakh Piano-roll Dataset for the impromptu accompaniment generation task of pop music.Compared with the existing models,the proposed model shows good performance in both quantitative and human evaluations.(3)A new Transformer-based neural network method is proposed for controllable multi-instrumental music generation,which expands the types of the control domain,and solves the problem that the existing work cannot generate any number of instrument types,showing better performances in terms of quantitative evaluation,visual evaluation,and human evaluation.Existing work of music generation can only control the generated samples to possess a fixed feature or type,such as chord or style.In addition,the music instrument types that can be modeled by these methods are relatively limited.In order to solve these two problems,this disserta-tion proposes a controllable music generation neural network model for multi-instrumental polyphony music.A series of experiments are carried out on the MIDICN dataset and the validity of the proposed model is verified.The experimental results show that the model performs well in log-likelihood,perplexity,musical quality measurements,domain similarity evaluation,and human evaluation.
Keywords/Search Tags:music detection, music relative loudness estimation, music generation, neural networks, variational auto-encoder
PDF Full Text Request
Related items