Rsearch And Implementation Of Single Channel Speech Separation With Unknown Number Of Speakers

Posted on:2023-03-27

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Yi

Full Text:PDF

GTID:2558306914957089

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Single channel speech separation refers to separating each speaker’s speech signal from a single channel audio containing multiple speaker’s speech signals.In recent years,single channel speech separation has made great progress.However,most of the current speech separation algorithms are done through mask operation.The number of speech channels that can be separated by the algorithm based on mask operation depends on the number of masks output by the model,which needs to preset the number of speakers in the input speech in the training stage.In the real scene,the number of speakers in the audio recorded by the microphone is often not fixed,which greatly limits the application of speech separation technology.In addition,speech has rich structure on multiple time scales,and this structure has not been effectively used in the current speech separation algorithms.At the same time,the current speech separation algorithm has the problem of switching the output stream of separated speech between different speakers.When there are more speakers,the problem occurs more frequently.In view of the above problems,the main contributions of this paper are as follows:Firstly,an encoder decoder framework based on multi-scale feature fusion for speech separation algorithm is proposed.In the speech processing scene,the quality of feature extraction is a key factor to determine the effect of speech processing,especially for the task of speech separation.The proposed framework integrates different scales of context information to improve the quality of speech separation.Second,this paper improves the existing iterative speech separation algorithm.In this paper,through the iterative speech separation network,the mixed speech of unknown speakers is separated through multiple iterations,and the threshold model is used as the iteration stop condition,which significantly improves the speed of the model.At the same time,through the speaker classification loss and adding long-term dependence to the output stream,the problem of frequent switching of output stream between different speakers is solved.Thirdly,this paper designs and implements the speech separation application system,combines the research of this paper with the application of speech recognition,and realizes the function of multi speaker speech recognition.It also provides a graphical operation interface.Users can complete the functions of speech separation and speech recognition through simple interaction.

Keywords/Search Tags:

speech separation, deep learning, source separation

PDF Full Text Request

Related items

1	Underdetermined Speech Separation Based On Sparse Representation And Deep Learning
2	Machine Learning For Underdetermined Speech Separation
3	Multi-speaker Speech Separation Based On Deep Learning
4	Research On Key Technologies For Multi-source Separation With Deep Neural Networks
5	Research On Speech Separation Algorithm Based On Fuzzy Clustering And Deep Learning
6	Research Of Speech Separation Based On Binaural Spatial Information
7	Research Of Mixed Speech Separation Based On Blind Source Separation Algorithm
8	Speech Separation Based On Deep Learning
9	Study On The Speech Enhancement Method Of The Multiple Speech Signals Separation
10	Underdetermined Source Separation And Its Application To Speech Processing