Font Size: a A A

Research Of Speech Separation Technology Of Multiple Speakers Based On Deep Learning

Posted on:2020-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z F LiFull Text:PDF
GTID:2428330578450931Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Voice communication is an indispensable part of human life.The content of voice information is more direct and richer than text,and the voice contains the speaker's emotions.Noisy language communication environment often appears in our real life,our brain can hear all kinds of mixed sounds according to the human ear,it is easy to distinguish the sound of attention.The famous "cocktail party" effect is that our brains can easily pick up sounds that interest us when mixed with multiple sounds.Human society is developing towards convenience and intelligence.A clear voice can enable intelligent appliances to more efficiently execute the instructions issued by the host.How to solve the problem of "cocktail party" has become the holy grail of speech separation.How to correctly distinguish the speaker's speech content in the mixed speech,reduce the noise in the separated speech,and improve the accuracy and quality of speech separation has become a hot topic in recent speech separation research.Mixed speech signal is complex and changeable.It is difficult to get the rule of speech in mixed speech signal.This paper is based on the deep learning framework,gives full play to the characteristics of deep learning,can adapt to different speech scenes,provides a solution to the problem of speech separation.Single-channel speech separation is a common method.Single-channel speech separation can not give full play to the role of multi-channel speech signals,resulting in low resource utilization.In the case that the interference is the speaker,this paper mainly carries out the following two studies:(1)The traditional k-means clustering algorithm randomly selects the initial center point.The division of clustering clusters is based on the initial center point.Improper center points are easy to cause the division error of clustering clusters,and the results of each iteration of clustering are unstable.The improved k-means clustering algorithm in this paper is also known as HK-means clustering algorithm.This algorithm uses the histogram as the characteristic of density estimation to change the choice of initial center point and select dense area as much as possible.At the same time,in order to reduce the interference of global data on local data,local data substitution method is adopted to estimate the initial center point.(2)The speech separation model uses the Bi-directional Long Short-Term Memory(BLSTM)to realize the separation of speech signals by selecting the number of different hidden layers and the number of hidden layers.The mixed speech signal of single channel is involved in the training of speech separation model,and the mixed speech signal of double channel is separated by speech.The similarity judgment and the adaptive beamforming(MVDR)algorithm are used to enhance the separated speech signal,so as to further improve the quality of the separated speech.
Keywords/Search Tags:Speech separation, deep learning, k-means algorithm, MVDR
PDF Full Text Request
Related items