Font Size: a A A

Mandarin's Synthesis By HTS And Research On Its Naturalness

Posted on:2007-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:L LiuFull Text:PDF
GTID:2178360212957545Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of information society and computer science, the requirement of artificial intelligence(AI) technology which Speech synthesis technology belong to, is more exigent. It is always the focus of research subjects how to make robot comprehend human language and communicate with people.Firstly, this paper investigates a Japanese speech synthesis technology which is called speech synthesis based on HMM(HTS). This technology is developed from speech recognition technology, and has many advantages, such as less storage space and rhythm's changeability. it is popular with intelligent toy manufacturer, and has wide application.In the research of Japanese speech synthesis using HTS, this paper analyzes the difference and relation between Chinese and Japanese, and implements mandarin's synthesis with HTS. Because of the HTS's characteristic and the size of Chinese speech database, HTS can not synthesize any speech of text besides statement tone. Aiming at this problem, this paper concentrates on speech of question tone, and discusses the implementation of speech synthesis for more tone. In the process of investigating the synthesis of speech which is question tone, this paper begins with the pitch of Chinese speech, concludes the relationship of pitch between question tone and statement tone, constructs the Chinese tone model(only for question tone), and obtains synthesis speech for question tone.After implementing the synthesis of Chinese speech, this paper analyzes the naturalness of speech which is synthesized. According to present phonetic researching achievement, when people read the same text at the different time, the acoustics characteristic of pronunciation has some differences, this phenomenon is called uncertainty in the organizing of rhythm. This paper discusses the implementation of uncertainty in the organizing of rhythm from three aspects, the organizing of rhythm structure, pitch model and duration model for representing that characteristic in the speech exactly. In the aspect of the organizing of rhythm structure, this paper applies the method based on rule-learning to predict the rhythm architecture of text. in the process of realization, this paper has modified C4.5 to make rhythm architecture of text various. in the aspect of duration model and pitch model, aiming at those characteristic, this paper applies different methods to modify the single Gaussian distribution of those model's state with mix Gaussian distribution, so that there are more parameter sequences for choosing and makes the speech more nature.
Keywords/Search Tags:speech synthesis, pitch model, speech naturalness
PDF Full Text Request
Related items