Font Size: a A A

Based On The Binary Semantic Annotation Of Waveform Concatenation Speech Synthesis

Posted on:2006-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:L LinFull Text:PDF
GTID:2208360155966830Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
TTS (Text To Speech) technology is a kind of technology that can translate the text information (the computer itself generated or input by other people), for example, a text file or a word document into the speech information. In a word, we want to let the computer read the text information fluently so that the people can understand the information only by listening. With the great development of computer technology and communication technology, TTS technology have applied to Speech dialog system, Call center system, Voice web pages and Voice email system, etc., and have a significant effect on application. However, all the TTS system now people used are suffered from the natural and understanding, and no TTS system can really read the text for people, so all these disadvantages make the TTS only can be used in limited fields.The first difficulty is the tagging of Prosodic information. In natural language, speech characters are protean and these characters connote a lot of knowledge. The people can feel the knowledge but cannot describe them. In the fields of automatically translating the words into prosodic markup, the limited understanding of natural language is the bottleneck of research work. Now, the translating of words into prosodic describe can only depend on these basic information such as syntax information (part-of-speech) to partition tone phrase or set the stress of a sentence, yet can not process deeply according to the semantic. And secondly, in the parts of acoustics, people are not fully able to know the parameters. Meanwhile, they are shot of elegant describe and people understand them only by the experiences. Therefore, all these limitations embarrass the development of information represented.In this paper, we depend on the development of natural language at our lab- binary relations syntax analysis and set up a set of marks according to the XML to markup the text which will be translated into voice, and at the same time we set up a set of regulars in order to transfer the semantic description into prosodic description. Meanwhile, we also considered the multi sounds words, numbers, symbols and characters, and set up serialsof description manners for this condition.In prosodic speech synthesizing, we collected 1248 Chinese single characters and more than 8000 often used Chinese phrases, including double character phrase, three character phrase, four character phrase and famous names of people and places. After analyzing and tagging, we record all of them on our speech database maintenance program by people, and after cutting and marking pitch, we save them into speech database and index database, thus, we get all the base speech data of our TTS system.Speech synthesizing module contains speech speed edit unit, speech mode edit unit, stress edit unit and silent generator unit, etc. All the units are in module form, and they can offer interface.In this speech synthesizing system, firstly, we set up prosodic marks based on the deep understanding of natural language and transform the semanteme markup to prosodic markup based on binary relations syntax analysis, therefore, this kind of markup is more advanced and can approach real prosodic purposes of human people. In synthesizing procedure, based on PSOLA algorithm and extensive speech database, we implement an easy voice prosodic control which makes the synthesized speech clearly and naturally and makes a great progress in understanding and naturalness.Next work in this paper included: deep research in semanteme markup and prosodic research, and to transfer more semanteme information into prosodic information; to set up a more extensive speech database, so that the language materials can contain not only sentences but also paragraphs of text; to create more prosodic control units in order to control not only prosodic in sentences but also between sentences and paragraphs.
Keywords/Search Tags:Speech synthesize, TTS, Prosodic tagging, PSOLA, Speech database
PDF Full Text Request
Related items