| As the use of video conferencing tools has exploded in recent years,providing highquality voice signals and accurate captioning has become a necessity for conducting daily business or connecting with friends and family.However,the quality of communication deteriorates when they are in noisy environments such as crowded places,factories and speeding cars.It is very important to develop technologies that can extract speech clearly in this environment,which can reduce noise without any signal distortion.The performance of Speech Enhancement(SE)algorithms is further enhanced with the microphone array,which can be used in environments with strong noise,reverberation and speaker interference.Using multiple microphones,spatial information can be extracted and combined with spectral information to obtain a better SE model.Therefore,this paper studies the technology based on microphone array.Real-time robust adaptation to various environments is of great significance for dealing with communication applied to real noise environments.Aiming at the speech distortion in denoising processing,it is proposed to improve the quality of SE.The speech enhancement algorithm of dual microphone array based on differential beamforming combined with Adaptive Mask(AM)is proposed.The first order difference time domain beamforming technique is used to process the signals received by the dual-microphone array.On this basis,a new method based on the combination of time domain and space domain is proposed.The output signals after differential beamforming are obtained and the distorted speech signals are recovered.Secondly,the speech signal still has some residual noise after difference.Noise estimation is carried out from the speech signal,the prior SNR is estimated and AM is calculated,then the enhanced speech is synthesized.AM combines ideal binary mask and ideal ratio mask according to a certain ratio through signal-to-noise ratio to avoid excessive noise suppression.Experiments indicate that this method can improve SNR and the speech quality of the signal is better than that of the comparative processing method,and the method is simple and robust.The algorithm was tested and implemented on the existing experimental development board,and the audition effect was good.The minimum variance distortionless response beamforming,especially when the directional vector is known,has a better application effect in the voice system.However,traditional techniques estimate speech and noise power spectral densities from spatial position information of sound sources.Their estimation error increases sharply when there are more noise sources.Guiding beamformers to focus on speech under unknown acoustic conditions remains a challenging problem.Firstly,the spectrum features of single channel are extracted.AM is then estimated using the long short term memory network.These predictive masks are then used to calculate the spatial covariance matrix to estimate the weight coefficients of the minimum variance distortionless response beamforming.The results show that the target speech signal can be clearly extracted even in the environment with low signal to noise ratio.Experiments under noise conditions indicate that the proposed method can significantly improve the quality of the microphone compared with the conventional microphone arrangement. |