| Within the framework of the speech processing technologies,voice conversion is defined as that allows transforming the voice characteristics of a speaker(the source speaker)to that perceived by listeners as if it has been uttered by another specific speaker(the target speaker)without altering the linguistic message.Although the voice contains abundant information,including semantic information,personality information,language information and emotional information,etc.Voice conversion mainly focuses on the spectral characteristics and the prosodic features.Among the multiple applications of voice conversion,such as the applications of entertainment and cross-lingual transformation field,voice conversion technology can provide high quality converted speech and conduct non-parallel voice conversion.The current voice conversion system is mainly faced with two problems.On the one hand,the converted voice can’t get higher similarity and better sound quality at the same time.Thus,the existing spectral conversion methods show a trade-off between the similarity of conversion achieved and the quality of the converted speech.On the other hand,the training of the conversion function depends on the parallel corpus,which limits the versatility of the voice conversion system.First,in order to achieve higher speech quality and similarity of voice conversion,in this thesis,a bilinear frequency warping plus amplitude scaling algorithm based on adaptive Gaussian classification is proposed,which uses adaptive Gaussian classification to better model the acoustic feature distribution of speech and perform voice conversion on the basis of more reasonable classification.The improved voice conversion method is evaluated by means of objective evaluation and subjective evaluation.The average mean opinion score of the converted speech is increased by 4.7% and the average mel-cepstral distortion is reduced by 2.7% compared with the bilinear frequency warping plus amplitude scaling algorithm with fixed classification.The results indicate that the proposed method improves the performance of voice conversion system.Second,in order to solve the dependence of the voice conversion method on the parallel corpus,this thesis uses the method of unit selection and vocal tract length normalization to align the non-parallel corpus,then,the bilinear frequency warping plus amplitude scaling method based on adaptive Gaussian classification is applied to non-parallel corpora voice conversion.The comparison between the subjective and objective evaluation experiments shows that the average mean opinion score of the converted speech is increased by 4.0% and the average mel-cepstral distortion is reduced by 7.1% compared with the non-parallel corpora INCA method,this indicates that the converted speech has higher quality and the better similarity.Compared with the traditional Gaussian mixture model,the average mel-cepstral distortion is 5.1% higher and the average mean opinion score is 3.9% lower than that of the traditional Gaussian mixed model voice conversion method,which indicates that there is still a certain gap in the conversion performance.However,this method is developed in non-parallel corpora conditions,with greater versatility. |