| Today’s speech conversations inevitably take place in complex and changing environments.Complex environmental noise brings great interference to both speaking(Far-end)and listening(Near-end)phases of speech conversations.Speech enhancement as a far-end noise suppression technology has been able to effectively suppress noise for long development.But the development of noise suppression technology in the near-end is relatively slow.In this paper,we study Speech Intelligibility Enhancement(IENH),which applies to noise suppression in the listening stage.This technique is a perceptual enhancement technique.In a real-world scenario,a speaker adjusts his vocal pattern according to the difference in ambient interference noise in order to reduce the suppression of speech by background interference.This speaker noise suppression mechanism is called the "Lombard effect".A data-driven approach uses data to learn and model the Speech Style Conversion(SSC)pattern between normal speech and Lombard speech.The present speech intelligibility enhancement method has a significant effect at a high Signal-to-Noise Ratio(SNR).However,the performance decreases significantly under low SNR conditions.Meanwhile,the default source speech of the current framework is normal speech,which is not flexible enough for the complex noise situation between the specific Far-end and near-end.To address the above challenges,the main research and contributions of this paper are as follows:(1)Noise level adaptive intelligibility enhancement algorithm.Firstly,based on the existing Lombard database,we study the Lombard effect and noise level constraint laws and regulations,and propose a speech intelligibility enhancement method based on noise level adaption at the far-and-near-ends,combining Generative Adversarial Networks(GAN)and self-attentive mechanisms to optimize the mapping for the problem that the existing speech intelligibility conversion framework is insufficient to meet the actual complex communication needs in real life.A suitable and accurate mapping model is applied to predict the acoustic features that match the near-end noise level and improve the intelligibility of proximal speech.The objective metric is improved by about 11% compared to the baseline.(2)Noise adaptive intelligibility enhancement algorithm based on metric optimization.To address the problem that the Lombard speech generated by the existing speech intelligibility conversion framework is relatively smooth and insufficient to meet the actual complex communication requirements,a metric optimization-based near-end noise adaptive speech intelligibility enhancement method is proposed.The adaptive energy distribution and noise waveform constraint laws under the Lombard effect are investigated to predict the acoustic characteristics that are consistent with the near-end real-time noise waveform,and to improve the intelligibility and speech quality of the near-end speech combined with the near-end noise acoustic characteristics.Compared with the research baseline,the objective metrics are significantly improved. |