Audio-visual Data Recognition Based On Adversarial-metric Learning And Attribute Guidance Learning

Posted on:2022-06-20

Degree:Master

Type:Thesis

Country:China

Candidate:M L Hu

Full Text:PDF

GTID:2518306542463614

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Audio-visual data recognition aims to identify the identity between audio clips and facial images.The goal of this task is to match the audio clips corresponding to the facial image,or vice versa.This technology will provide tremendous help for information retrieval and criminal investigation in the future.At present,the main challenges of this task include noisy audio clips,low-resolution images,and the natural gap between different modalities.In the last few years,researchers have proposed different methods to solve this task in response to different challenges,which mainly concentrates on learning discriminative feature representations.However,the results of audio-visual data recognition are still far from reaching the requirements of practical applications.To overcome various challenges and carry out more in-depth research and exploration,we focus on the solving cross-modal modality gap between audio clips and facial images.The contributions are as follows:(1)Considering the natural heterogeneous gap between audio clips and face images,we propose a novel adversarial learning framework.Adversarial learning aims to generate modality-independent feature representation for each person in each modality.In addition,considering that the feature representation of the same identity should be more compact,we propose to utilize metric learning to learn a robust similarity metric for audio-visual data recognition.By integrating modality-independent representation and robust metric learning for audio-visual data recognition into an end-to-end trainable network,our method can overcome the heterogeneous issue between audio and image modalities and achieves a considerable performance.(2)Considering the heterogeneous gap between audio clips and face images,we propose to utilize high-level semantic attribute information to shrink the cross-modality gap.By constraining the consistency of the facial image and the audio clips in the public attributes,we first pull the data of different modalities closer in the public attribute space,which can alleviate the gap between the cross-modal data.In addition,considering the similarity between the same identities,we propose to leverage the private attributes in each identity to increase the intra-class consistency.By incorporating private attributes into the public attribute learning framework,the proposed method can narrow the gap between the same attributes of different modal while maintaining intra-class consistency.Comprehensive experimental results demonstrate the improvement of the proposed method for audio-visual data recognition.

Keywords/Search Tags:

Audio-visual data recognition, Adversarial learning, Metric learning, Attributes

PDF Full Text Request

Related items

1	Research And Implementation Of Few-shot Recognition Algorithm Based On Metric Learning And Data Augmentation
2	Research On Visual Object Recognition In The Framework Of Metric Learning
3	Learning Visual Attributes For Image's Label Analysis
4	Research On Adversarial Attack And Defense Technology In Machine Learning
5	Research On The Technology Of Deep Learning Based Face Image Recognition
6	Study On Na(?)ve Similarity Discriminator-based Deep Adversarial Metric Learning
7	Multimodal Cognitive Learning For Audio-visual Data
8	Research On Metric Learning Algorithm Based On Sample Pairs And Triples
9	Research On Facial Expression Recognition Technology Based On Deep Metric Learning
10	Study On Domain Invariant Feature Learning For Heterogeneous Face Recognition