Font Size: a A A

Research On Speaker Recognition Method Based On Multi-Task Learning

Posted on:2024-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:H WangFull Text:PDF
GTID:2568306944955769Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Voiceprint recognition,also known as speaker recognition,is widely used as a biometric recognition technology in many fields,such as public security and justice,military defense,security,and document anti-counterfeiting.With the continuous popularization and development of deep learning,many deep learning models have achieved good experimental results in the field of speaker recognition.However,due to the unique nature of human vocal organs,the information contained in voiceprint features is very rich,and single task speaker recognition models cannot well capture and utilize these features to improve the accuracy of speaker recognition.Combining the current development of speaker recognition and multi task learning,this paper studies a method of speaker recognition based on multi task learning,making full use of the rich features in voiceprint for multi task learning,and improving the final effect of speaker recognition by learning the correlation knowledge between multiple tasks.To address the problem that the single-task vocal recognition model fails to fully utilize the speaker-related attribute information in the vocal features,this paper adopts the selfattentive mechanism as the main algorithm of the model and constructs the Multi-Task SelfAttention Network(MT-SANet)model,in order to make the feature learning vector acquire richer knowledge in the vocal features.In order to make the feature learning vector acquire richer knowledge in the voiceprint features,the multi-task feature vector is initialized by random sampling in a uniform distribution and embedded into the first part of the voiceprint features as a fused feature matrix into the MT-SANet model,and the trained feature learning vector is used to achieve more accurate voiceprint recognition by the downstream task classifier.The comparison experimental results on Leap Corpus dataset and Fair Voice dataset show that the MT-SANet model proposed in this paper can better improve the accuracy and convergence speed of voice recognition.To address the problem of negative migration caused by feature learning vectors not being sufficiently learned in shallow networks,this paper introduces a masking mechanism in the process of attention computation and proposes to construct the Multi-Task Masked SelfAttention Network(MT-MSANet)model to maintain the The experimental results on Leap Corpus dataset and Fair Voice dataset demonstrate the effectiveness of MT-MSANet model by adjusting the perceptual field of attention computation to maintain the balance between multitask sharing and decoupling.In addition,the construction scheme of the auxiliary task set is investigated in this paper,and the language learning stages of language learners are proposed as speaker attribute information from a new research perspective to be added to the auxiliary task set for multi-task learning to further improve the accuracy of the MT-MSANet model for voice recognition.The effectiveness of introducing language learning stages as auxiliary attributes is verified by conducting ablation experiments on the Leap Corpus language learner dataset.
Keywords/Search Tags:speaker recognition, multi-task learning, self-attention, masking mechanism, language learning stage
PDF Full Text Request
Related items