| Whispered speech is a special pronunciation style of human-beings,which is produced with no vocal-cord vibration.Whispered speech is widely used for private speech communication in public places.In addition,aphonic individuals with laryngectomy as well as those with low vocal capability also adopt whispering as their primary pronunciation form for oral communication.Due to its low energy,whispered speech is often transformed to normal speech for improving its speech quality.This thesis focuses on the research of whisper-to-normal conversion methods based on generative adversarial networks.The major works are as follows:Firstly,a novel generative adversarial network based whisper-to-normal speech conversion method was proposed.An “encoding-decoding" structure was adopted in the generator,which makes whispered speech feature as the input,and outputs the converted normal speech.Experimental results show that the proposed method obtained better speech quality of the converted speech than traditional GMM and BLSTM based methods.In order to utilize the correlation between the acoustic features of successive speech frames,a novel attention-guided generative adversarial network was proposed for whisper-to-normal speech conversion.The experimental results show that compared with the previous "encoding-decoding" based GAN method,this attention-guided GAN method improves the whisper-to-normal conversion performance in aspect of speech quality. |