Research On Multimodal Audio-Visual Separation Model Based On Attention Mechanism

Posted on:2024-05-28

Degree:Master

Type:Thesis

Country:China

Candidate:Y T Zhang

Full Text:PDF

GTID:2568307160955549

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Audio source separation refers to the separation of individual sound signals from a mixed audio source containing multiple sounds.It can be applied to practical applications such as speech separation,music automation processing and so on.It has very important research significance.The existing audio source separation models can be divided into traditional-based models and deep learning-based models,but they all have some shortcomings.Some models only use audio information and ignore visual information,which lead to data information waste.The network of some models are relatively simple and cannot extract enough feature information.Some models are weakly against noise and cannot focus on more important feature information.The other models ignore the difference between different types of features and fused the features directly which raises the semantic gap problem.To address the above shortcomings,this thesis constructs a multimodal audio-visual separation model based on attention mechanism to better solve the audio source separation problem.To address the problems of simple network model and weak noise resistance,this thesis proposes a multimodal audio-visual separation model based on single-channel attention mechanism.The model takes audible video as the data set and designs two different networks to obtain the feature information of both visual and audio modalities,avoiding the waste of data information.In the visual analysis module,channel attention and spatial attention are introduced and serially connected to build a hybrid domain attention mechanism,which improves the noise resistance of the model and allows the model to focus on the more important feature information in the data while ignoring other distracting information.The full-scale jump connection structure is designed in the audio information module to better connect the shallow features with the deep features so that the model can obtain enough feature information.To address the problem that the model directly fuses different types of feature information,this thesis proposes a multimodal audio-visual separation model based on a dual-channel attention mechanism.The model still selects audible video as the dataset to obtain visual and audio feature information from a multimodal perspective,and serially connects channel attention and spatial attention in the visual channel to build a hybrid domain attention mechanism to enhance the noise resistance of the model.The attention gating mechanism is designed in the audio channel attention module to dynamically fuse high-level features and low-level features in the network by designing weights to avoid the semantic gap problem caused by the direct fusion of feature information.It also reduces the noise generated during the training process.The final output spectrograms and quantitative experiments on the MUSIC-21 and AVE dataset demonstrate the superior audio separation performance of this model compared with previous audio separation models.An audio-visual separation system is also built to better apply the model to practical engineering.

Keywords/Search Tags:

multimodality, sound source separation, attention mechanism, audio-visual separation, feature information, spectrogram

PDF Full Text Request

Related items

1	Research On Audio-visual Cross-modal Sound Source Separation
2	Research On Audio Spatialization Based On Visual Information
3	Research For Self-attention Based Audio Source Separation Model
4	Research On Speech Separation Algorithm Based On Deep Learning
5	Research On Speaker Speech Separation In The Scene Of Wearing A Mask
6	Research On Audio Mixed Signal Separation Method Based On Deep Neural Network
7	Non-negative Sparse Signal Decomposition And Monaural Sound Separation
8	The Research And Application Of Cardiorespiratory Sounds Separation Meathod Based On Blind Source Separation
9	Research On Sound Source Identification And Sound Field Separation Method For Single Holographic Surface
10	Research On Sound Source Separation Technology In Speech Recognition System