| Single channel speech separation refers to separating each speaker’s speech signal from a single channel audio containing multiple speaker’s speech signals.In recent years,single channel speech separation has made great progress.However,most of the current speech separation algorithms are done through mask operation.The number of speech channels that can be separated by the algorithm based on mask operation depends on the number of masks output by the model,which needs to preset the number of speakers in the input speech in the training stage.In the real scene,the number of speakers in the audio recorded by the microphone is often not fixed,which greatly limits the application of speech separation technology.In addition,speech has rich structure on multiple time scales,and this structure has not been effectively used in the current speech separation algorithms.At the same time,the current speech separation algorithm has the problem of switching the output stream of separated speech between different speakers.When there are more speakers,the problem occurs more frequently.In view of the above problems,the main contributions of this paper are as follows:Firstly,an encoder decoder framework based on multi-scale feature fusion for speech separation algorithm is proposed.In the speech processing scene,the quality of feature extraction is a key factor to determine the effect of speech processing,especially for the task of speech separation.The proposed framework integrates different scales of context information to improve the quality of speech separation.Second,this paper improves the existing iterative speech separation algorithm.In this paper,through the iterative speech separation network,the mixed speech of unknown speakers is separated through multiple iterations,and the threshold model is used as the iteration stop condition,which significantly improves the speed of the model.At the same time,through the speaker classification loss and adding long-term dependence to the output stream,the problem of frequent switching of output stream between different speakers is solved.Thirdly,this paper designs and implements the speech separation application system,combines the research of this paper with the application of speech recognition,and realizes the function of multi speaker speech recognition.It also provides a graphical operation interface.Users can complete the functions of speech separation and speech recognition through simple interaction. |