Co-channel Speaker Recognition Based On Deep Learning

Posted on:2024-01-09

Degree:Master

Type:Thesis

Country:China

Candidate:K X Feng

Full Text:PDF

GTID:2568307100480004

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

The co-channel speech refers to the speech signal detected by a microphone from multiple speakers in the same acoustic field environment,which is the mixture of multiple speakers’ speech signals.The co-channel speaker recognition technology refers to the use of mixture speech signal to recognize the identity of multiple speakers.Due to the interference between the speech signals of different speaker,it is difficult to directly use the mixture speech signals to realize the identity recognition of speakers.Usually,the speech signals of different speakers in the mixture speech signals should be separated first,and then the identity recognition of speakers can be realized.This paper mainly studies the co-channel speaker recognition method based on deep learning.The main research work is as follows:1.By studying the complementary relationship between speaker separation network and speaker recognition network,this paper proposes a speaker recognition method based on multi-channel convolutional neural network,and combines the speaker separation network to construct an end-to-end co-channel speaker recognition system in the time domain.In the speaker separation network,this method uses stacked dual-path recurrent neural network to learn the time-domain mask and get the estimated speech of each speaker.In the speaker recognition network,the estimated speech of each speaker is segmented,and the feature vectors of each segment are extracted with multiple parameterized bandpass filters.Then,the feature vectors of all segments are aggregated,and the mean and standard deviation are calculated to form the utterancelevel feature vector.Finally,the utterance-level feature vector is identified by deep neural network.In addition,this method use multi-task learning algorithm for joint optimization of the whole system.Experimental results show that the co-channel speaker recognition method based on multi-channel convolutional neural network is effective.2.By studying the influence of speaker separation network on the performance of the overall co-channel speaker recognition system,this paper proposes a speaker separation method based on multi-scale feature fusion,and combined with the speaker recognition network,a new time-domain end-to-end co-channel speaker recognition system is constructed.In the speaker separation network,this method extract the feature matrix after each DPRNN,use squeeze and excitation module to obtain the importance of features of different scales,then weight the features on each channel according to this importance,and their spatial features are learned through a convolution layer to fuse,and the final time-domain mask is output.In addition,this method use multi-task learning algorithm for joint optimization of the entire system.Experimental results show that the co-channel speaker recognition method based on multi-scale feature fusion is effective.

Keywords/Search Tags:

Co-channel speaker recognition, Speaker separation, Multi-channel convolutional neural network, Time-domain, Multi-task learning

PDF Full Text Request

Related items

1	The Study Of Speaker Indexing In Multi-Channel Environment
2	Research On Multi-dimensional Speaker Recognition Based On Neural Network
3	Research On Speaker Adaptation Of Neural Network Acoustic Models For Speech Recognition
4	Speaker Recognition Based On Aggregate Convolutional Neural Network Of Feature Guiding And Multi-Task Learning
5	Research On Monaural Speech Separation Of Specific Speaker Based On Deep Learning
6	Study On Speaker Recognition Based On Deep Learning
7	Research On Speaker Recognition Method Based On Multi-Task Learning
8	Research On Speaker Recognition Method Based On Deep Learning
9	Text Independent Speaker Recognition Based On Deep Learning Framework
10	Speaker-Independent Single-Channel Speech Separation Based On Deep Learning