Font Size: a A A

Cross-modal Face-Voice Biometric Recognition Via Coupled Deep Networks

Posted on:2022-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:R WangFull Text:PDF
GTID:2568306728456584Subject:Engineering
Abstract/Summary:PDF Full Text Request
Lots of researches in neurology and cognitive science show that there is a latent association between face and voice.By mining this association,cross-modal face-voice biometric recognition can be realized.Cross-modal face-voice biometric recognition technology has certain practicability in real life,which can promote the development of human perception and smart man-machine conversation.It is very rewarding in daily life and can be used everywhere.Aiming at the problems of cross-modal face-voice biometric recognition technology,this paper proposes a series of solutions :(1)Aiming at the mismatch problem caused by intra-modal and intermodal variations,a cross-modal face-voice recognition model based on triple loss with double constraints is proposed.In this model,a triple loss based on dual constraints is designed,which can realize the discr iminative inter-modal constraints and the intra-modal constraints at the same time,and effectively alleviate the differences between modals.The former is used to ensure the separability of features and the stability of training,while the latter is used to reduce the distance within the class.(2)Aiming at the problem of weak association learning caused by the lack of interacting between modals,and the problem of weak generalization performance caused by the single form of training samples,a cross-modal face-voice recognition model based on bi-directional hard quintuple loss is proposed.In this model,a novel weighted residual network block is added at the top of face and voice subnetworks.The weighted residual structure and nonlinear activation unit are introduced in this module,and all parameters are shared by two modal to ensure full interaction between modes.In addition,from the perspective of hard sample mining,this model designs a bi-directional hard quintuple loss and corresponding bi-directional hard quintuple construction strategy,in which the bi-directional hard quintuple are composed of different forms of hard triplet,which makes the model have desirable generalization performance.(3)Aiming at the problem of weak supervision information caused by single label in existing cross-modal face voice data sets,a cross-modal face-voice recognition model combined with self-supervised learning is proposed.In this model,supervised learning method and self-supervised learning method are integrated into a framework.In the part of supervised learning method,the identity loss is retained.In the part of self-supervised learning method,a cross modal deep clustering framework is proposed.The clustering result of one modal feature is used as the pseud o label of the corresponding feature of another modal,and self-supervised learning is guided by optimizing the loss of prediction label and pseudo label.Test results on voxceleb1 show that compared with available models,the three models raised in this paper have achieved effective improvement in four cross-modal face-voice matching tasks.
Keywords/Search Tags:face-voice matching, dual-constraints, triplet, bi-directional hard quintuple, self-supervised
PDF Full Text Request
Related items