Music Detection And Generation Using Neural Networks

Posted on:2022-06-15

Degree:Doctor

Type:Dissertation

Country:China

Candidate:B J Jia

Full Text:PDF

GTID:1485306551969949

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Computer science integrates musicology and promotes the automatic and intelligent development of musicology,giving birth to two major research directions:music information retrieval and music generation.Music detection is a key task in the field of music information retrieval.It focuses on how to detect and localize music events from audio.Music generation is an exploration of artificial intelligence in the aspect of artistic creation.It focuses on studying algorithms that allow computers to automatically compose music.As the amount of digital music is growing rapidly,neural network method become the mainstream method of music detection and music generation in recent years,the reason is that it is good at learning massive data and has strong modeling ability.Recent music detection neural network methods usually converts music detection problem into a problem similar to image classification or sequence labeling and then uses the Convolutional Neural Network or Recurrent Neural Network to solve the problem.However,the existing neural network methods ignore the subtask of music relative loudness estimation and the temporal characteristic of music data when solving the music detection task in music information retrieval.The current music generation neural network methods usually first represents music into image-like or language-like data format and then adopts models such as Variational Auto-Encoder or Transformer to model it.Nevertheless,the existing music generation neural network methods have not elegantly solved the discrete problems which is brought by the piano-roll represented music data,also limitation issue of the control domains and instrument types in the task of controllable music generation.This dissertation focuses on the above problems,and the innovative research achievements and main contributions obtained in this dissertation include:(1)A hierarchical regulated iterative neural network method is proposed for the joint task of music detection and music relative loudness estimation,which solves the problem of lack of temporality and hierarchy modeling in the joint task and significantly improves the performance of music detection and music relative loudness estimation in the segment-and event-level evaluations.Existing music detection work often ignores the subtask of music relative loudness estimation,so this dissertation mainly studies the joint task of music detection and music relative loudness estimation.However,the existing methods ignored to model the temporality and hierarchy.We reformulate the joint task as hierarchical event detection and localization problem,and propose Hierarchical Regulated Iterative Networks to solve this problem.Extensive experiments on the public dataset OpenBMAT show that the proposed HRIN performs better in the segment-level and event-level evaluations.(2)A coupled latent variable model with binary regularizer is proposed,which solves the distribution modeling problem in and between tracks of pop music,and also solves the discrete non-differentiable problem caused by symbolic music data represented by piano-rolls,and achieves better performance in both quantitative and manual evaluations.Pop music is the most popular music genre with the widest audience,so the research on the generation of pop music has certain application value.However,pop music usually contains multiple tracks,which makes the generation of pop music challenging.In addition,before using the neural network method to process music data,a common practice is to represent music in the form of image-like data.Different from the continuous image data,the image-like represented music data(i.e.piano-roll representation)is discrete and binary,which leads to a non-differentiable problem.In order to solve the above challenge and problem,a new neural network method is proposed,which is a coupled latent variable model with binary regularizer.Experiments are carried out on the Lakh Piano-roll Dataset for the impromptu accompaniment generation task of pop music.Compared with the existing models,the proposed model shows good performance in both quantitative and human evaluations.(3)A new Transformer-based neural network method is proposed for controllable multi-instrumental music generation,which expands the types of the control domain,and solves the problem that the existing work cannot generate any number of instrument types,showing better performances in terms of quantitative evaluation,visual evaluation,and human evaluation.Existing work of music generation can only control the generated samples to possess a fixed feature or type,such as chord or style.In addition,the music instrument types that can be modeled by these methods are relatively limited.In order to solve these two problems,this disserta-tion proposes a controllable music generation neural network model for multi-instrumental polyphony music.A series of experiments are carried out on the MIDICN dataset and the validity of the proposed model is verified.The experimental results show that the model performs well in log-likelihood,perplexity,musical quality measurements,domain similarity evaluation,and human evaluation.

Keywords/Search Tags:

music detection, music relative loudness estimation, music generation, neural networks, variational auto-encoder

PDF Full Text Request

Related items

1	Music Score Modeling And Generation By Recurrent Neural Network
2	Research And Implementation Of Background Music Generation System For Short Video Based On Neural Network
3	Research On The Algorithm Of Music Generation Based On Deep Learning
4	Research And Implementation Of A CNN-based Piano Music Transcription Algorithm
5	Research On Recognition And Error Detection Technology For Piano Playing Music
6	A Preliminary Study On The Ai Model For Composing
7	The Research Of Music Generation Based On Generative Adversarial Networks
8	Pitch Contours Curve Frequency Domain Fitting With Vocabulary Matching Based Music Generation
9	Music Caused By Emotion Emotion And Music Echo Produces Beauty
10	A Sentence-level Quality Estimation For Neural Machine Translation Based On Subword Regularization