Font Size: a A A

Research On Music To Dance Generation Model Based On Convolutional Augmentation Transformer

Posted on:2024-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:M G ZhangFull Text:PDF
GTID:2568307112476684Subject:Electronic information
Abstract/Summary:PDF Full Text Request
In recent years,with the development of the digital era,music-to-dance generation research has received wide attention from industry and academia,and has become one of the fundamental tasks in the cross-modal field.This research can be widely applied in many fields such as entertainment,education,and virtualization,and has good application prospects.At present,researchers have put forward many methods based on deep learning and achieved some results.However,music-to-dance generation research not only generating long continuous movements of high complexity,but also capturing the non-linear relationships between the actions and movements,and ultimately making the generated dance match with the music,so music-to-dance generation research is a very challenging task.There are mainly the following problems:(1)It is a long sequence generation task.(2)There is noise in the nodal data of dance movements in the dataset.(3)There are local and global dependencies in the music sequences and dance movement sequences.(4)The same dance style has more similar primitive movements and different dance styles have dissimilar primitive movements.To address these problems,the following contributions are made in this paper.(1)A new autoregressive generative adversarial network model is proposed.The model is based on Convolutional Augmentation Transformer(Conformer)to construct a music encoder,an action encoder and a cross-modal generator,which captures the local spatial features and the global features of music and dance action sequences in temporal order by Conformer,thus enabling to capture the local and global dependencies between sequences and to reduce the influence of noisy data.The model is analyzed in an experimental comparison with a benchmark method on a publicly available dataset,and the proposed method exceeds the benchmark method in terms of evaluation metrics,not only generating long and coherent dance movements,but also generating dance movements that match the music.(2)A music-to-dance generation model combining Convolutional Augmentation Transformer(Conformer)and contrast learning is proposed,which uses contrast learning to capture the high-level semantic features of dances so that similar dances are encoded similarly and the encoding results of different classes of dances are as dissimilar as possible,thus enabling the model to generate more similar dance movements for dances of the same dance style,while dances of different dance styles generating dance moves that are as dissimilar as possible.The model is also analyzed in an experimental comparison with the benchmark method on a publicly available dataset,and our method generates more coherent and natural dance movements.The same dance style should have more similar primitive movements while different dance styles have less similar primitive movements.
Keywords/Search Tags:Cross-Modal Generation, Dance Generation, Conformer, Contrastive Learning, Dance Style
PDF Full Text Request
Related items