| With the rapid development of artificial intelligence technology and the popularization of intelligent terminals,speech has gradually become the most important human-computer interaction interface in daily life.Speech enhancement separates clean speech components from speech polluted by noise,improves speech quality and intelligibility,and improves the performance of speech interaction systems.In a non-stationary noise environment,previous studies have shown that speech enhancement based on deep learning is better than traditional "spectral subtraction" and other methods.However,speech enhancement method based on deep learning also has certain defects: most speech enhancement algorithms in the frequency domain use the amplitude spectrum or logarithmic power spectrum of speech as the input feature of the model,ignoring the expression of speech information by the phase spectrum of speech,and using the phase spectrum of noisy speech when restoring speech,which interferes with the speech quality and intelligibility to a certain extent and reduces the effect of speech enhancement.In order to avoid the shortcoming that the frequency domain speech enhancement cannot restore the phase,some researchers directly use the time domain waveform for speech enhancement to establish the mapping relationship between the noisy speech and the pure speech waveform,but the structure of the noisy speech waveform in the time domain is not as good as that of the frequency domain feature.It is easy to cause the gradient to disappear and cannot be converged,and it relies on a large deep network,which makes the model training more difficult.In view of the above problems,the main work of this thesis is as follows:(1)Aiming at the problem that the time-domain waveform features are difficult to converge,a time-domain enhancement algorithm combined with phase spectrum compensation is proposed in this thesis.Firstly,the improved phase spectrum compensation algorithm is used for speech enhancement to achieve the function of data preprocessing,and effectively remove the noise components of noisy speech,which overcomes the shortcoming of time-domain speech enhancement being sensitive to noise and difficult to converge.Then the fully convolutional neural network is used to perform secondary enhancement on the phase spectrum compensated speech to further improve the effect of the speech enhancement algorithm.(2)To solve the performance degradation problem of deep network,this thesis proposes a speech enhancement algorithm Dense-FCN-M based on Dense Net.On the basis of skip connection,dense connection of Dense Net network is used to improve the encoder layer of fully convolutional neural network,enhance feature reuse,and reduce the loss of speech features in the process of encoding and decoding.In addition,this thesis proposes a joint loss function of MFCC.In the process of backpropagation,the MFCC loss of noisy speech and pure speech is incorporated into the calculation of the loss function.Compared with the mean absolute value error function,the Dense-FCN-M enhanced speech with the joint MFCC loss function achieves better objective scores,and the MFCC value of the enhanced speech is also closer to the pure speech. |