Font Size: a A A

Co-channel Speaker Recognition Based On Deep Learning

Posted on:2024-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:K X FengFull Text:PDF
GTID:2568307100480004Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The co-channel speech refers to the speech signal detected by a microphone from multiple speakers in the same acoustic field environment,which is the mixture of multiple speakers’ speech signals.The co-channel speaker recognition technology refers to the use of mixture speech signal to recognize the identity of multiple speakers.Due to the interference between the speech signals of different speaker,it is difficult to directly use the mixture speech signals to realize the identity recognition of speakers.Usually,the speech signals of different speakers in the mixture speech signals should be separated first,and then the identity recognition of speakers can be realized.This paper mainly studies the co-channel speaker recognition method based on deep learning.The main research work is as follows:1.By studying the complementary relationship between speaker separation network and speaker recognition network,this paper proposes a speaker recognition method based on multi-channel convolutional neural network,and combines the speaker separation network to construct an end-to-end co-channel speaker recognition system in the time domain.In the speaker separation network,this method uses stacked dual-path recurrent neural network to learn the time-domain mask and get the estimated speech of each speaker.In the speaker recognition network,the estimated speech of each speaker is segmented,and the feature vectors of each segment are extracted with multiple parameterized bandpass filters.Then,the feature vectors of all segments are aggregated,and the mean and standard deviation are calculated to form the utterancelevel feature vector.Finally,the utterance-level feature vector is identified by deep neural network.In addition,this method use multi-task learning algorithm for joint optimization of the whole system.Experimental results show that the co-channel speaker recognition method based on multi-channel convolutional neural network is effective.2.By studying the influence of speaker separation network on the performance of the overall co-channel speaker recognition system,this paper proposes a speaker separation method based on multi-scale feature fusion,and combined with the speaker recognition network,a new time-domain end-to-end co-channel speaker recognition system is constructed.In the speaker separation network,this method extract the feature matrix after each DPRNN,use squeeze and excitation module to obtain the importance of features of different scales,then weight the features on each channel according to this importance,and their spatial features are learned through a convolution layer to fuse,and the final time-domain mask is output.In addition,this method use multi-task learning algorithm for joint optimization of the entire system.Experimental results show that the co-channel speaker recognition method based on multi-scale feature fusion is effective.
Keywords/Search Tags:Co-channel speaker recognition, Speaker separation, Multi-channel convolutional neural network, Time-domain, Multi-task learning
PDF Full Text Request
Related items