| The speech signals collected in smart homes,phone communications,and online conference are usually mixed signals where multiple speakers speak at the same time.In order to obtain single and pure signals for subsequent processing such as speech recognition,it is necessary to perform speech separation.The purpose of speech separation is to obtain high-quality and high-intelligibility speech signals from mixed speech signals.In actual speech separation,the information related to the speaker gender combination of mixed speech is often unknown.However,there is no robust and effective speech separation method for sex combination detection.In general,speech separation is performed directly on the universal model,resulting in poor separation performance due to the lack of specificity for mixed speechs of different gender combinations.In recent years,deep learning technology has developed rapidly,achieving significantly better performance than traditional algorithms.Based on deep learning,this thesis will focus on the single-channel speech separation problem with unknown gender combination of mixed speech.The main research work of this thesis contains two aspects as follows:(1)In order to better carry out speech separation with unknown gender combination of mixed speech,dual-branch convolutional neural network(CNN)fusion model based blind speech separation is proposed in this thesis,which determines that the gender group of mixture speech is male-male,male-female or female-female,so as to select separation model corresponding to gender group for high-quality speech separation task.Compared with the traditional speaker recognition,the mixed speech gender combination recognition is much more difficult,using traditional single feature to detect three gender combinations could result in incorrect recognition results.To make up for the lack of gender combination information represented by traditional single feature,a strategy of mining deep fusion features is proposed,so that the classification features contain more information of gender combination categories.The proposed strategy uses dual-branch convolutional neural network fusion model to extract the deep fusion features of Mel frequency cepstrum coefficients and filter bank features,and could mine the classification features that can distinguish the gender combination categories deeply.Then,SVM is used to recognize the gender combination of mixed speech.Finally,DNN or CNN model corresponding to gender combination is selected for speech separation.The experimental results show that compared with the traditional single feature,the deep fusion feature proposed can effectively improve the recognition rate of gender combination of mixed speech.In various performance evaluation,dual-branch convolutional neural network fusion model based blind speech separation is superior to the universal speech separation model.(2)In the blind speech separation stage based on the dual-branch convolutional neural network fusion model,the phase information is not estimated when the time-frequency mask based deep neural network is used for speech separation.The phase spectrum of mixed signal is used as the target signal phase spectrum,which leads to phase distortion of the predicted signal.To address this problem,a fully convolutional neural network is constructed to achieve end-to-end speech separation,which takes time-domain mixed speech and pure speech as the input and target of the network respectively and uses a convolutional encoder and a deconvolutional decoder to complete the separation.The separation method does not need to recover the phase spectrum and omits the reconstruction of the signal from frequency domain to time domain.On the basis of this,this thesis further proposes fully convolutional neural network and multi-task learning based blind speech separation in time domain,which combines the task of speech separation and gender combination detection in shared network and separates speech under the constraint of gender combination detection task of mixed speech.The time-domain blind speech separation method can extract effective auxiliary information from the task of mixed speech gender combination detection and combine the auxiliary information with the speech separation model to achieve better speech separation.The experimental results show that the performance of the proposed method is better then single-task blind speech separation method. |