Cross-modal Face-Voice Biometric Recognition Via Coupled Deep Networks

Posted on:2022-06-13

Degree:Master

Type:Thesis

Country:China

Candidate:R Wang

Full Text:PDF

GTID:2568306728456584

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Lots of researches in neurology and cognitive science show that there is a latent association between face and voice.By mining this association,cross-modal face-voice biometric recognition can be realized.Cross-modal face-voice biometric recognition technology has certain practicability in real life,which can promote the development of human perception and smart man-machine conversation.It is very rewarding in daily life and can be used everywhere.Aiming at the problems of cross-modal face-voice biometric recognition technology,this paper proposes a series of solutions :(1)Aiming at the mismatch problem caused by intra-modal and intermodal variations,a cross-modal face-voice recognition model based on triple loss with double constraints is proposed.In this model,a triple loss based on dual constraints is designed,which can realize the discr iminative inter-modal constraints and the intra-modal constraints at the same time,and effectively alleviate the differences between modals.The former is used to ensure the separability of features and the stability of training,while the latter is used to reduce the distance within the class.(2)Aiming at the problem of weak association learning caused by the lack of interacting between modals,and the problem of weak generalization performance caused by the single form of training samples,a cross-modal face-voice recognition model based on bi-directional hard quintuple loss is proposed.In this model,a novel weighted residual network block is added at the top of face and voice subnetworks.The weighted residual structure and nonlinear activation unit are introduced in this module,and all parameters are shared by two modal to ensure full interaction between modes.In addition,from the perspective of hard sample mining,this model designs a bi-directional hard quintuple loss and corresponding bi-directional hard quintuple construction strategy,in which the bi-directional hard quintuple are composed of different forms of hard triplet,which makes the model have desirable generalization performance.(3)Aiming at the problem of weak supervision information caused by single label in existing cross-modal face voice data sets,a cross-modal face-voice recognition model combined with self-supervised learning is proposed.In this model,supervised learning method and self-supervised learning method are integrated into a framework.In the part of supervised learning method,the identity loss is retained.In the part of self-supervised learning method,a cross modal deep clustering framework is proposed.The clustering result of one modal feature is used as the pseud o label of the corresponding feature of another modal,and self-supervised learning is guided by optimizing the loss of prediction label and pseudo label.Test results on voxceleb1 show that compared with available models,the three models raised in this paper have achieved effective improvement in four cross-modal face-voice matching tasks.

Keywords/Search Tags:

face-voice matching, dual-constraints, triplet, bi-directional hard quintuple, self-supervised

PDF Full Text Request

Related items

1	Research On Face Recognition System Based On Convolutional Neural Network
2	Study On The Task-Oriented Path Planning Methods For An Omni-Directional Mobile Dual-Arm Robot
3	Deep Learning Methods For Face Detection And 3D Reconstruction
4	Research On Face Recognition Based On Machine Learning Method
5	Research On High-fidelity 3D Face Reconstruction Task Based On Self-Supervision
6	Research On Quintuple Implication Principle Based On Fuzzy Reasoning
7	Vehicle routing problems with semi-hard resource constraints
8	Software Defined Network Measurement and Inference Under Hard Resource Constraints
9	Research On Single Stage Multi-level Face Detection Algorithm Based On Convolutional Neural Network
10	Face Identifying Based On Shape Matching