Font Size: a A A

Chinese Prosodic Phrases Recognition Based On Deep Learning

Posted on:2021-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:X Z RenFull Text:PDF
GTID:2415330626955141Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The development of speech synthesis technology.Make it possible to communicate with the machine through language.In the speech synthesis system,the evaluation of a synthesized speech level is mainly from two aspects,intelligibility and naturalness.At present,the intelligibility of speech has reached the expected requirements,and the naturalness of speech still needs to be improved.There are many factors that affect the naturalness of speech,and prosody structure is one of the important factors.Aiming at the important problem of prosody structure analysis,this paper starts from speech and text,combines text features,sentence similarity features,phrase structure features,and phonetic acoustic features,and uses deep learning to identify the location of prosody phrase boundaries.The main research contents of this article are:(1)Acquisition of text features at the boundary of prosodic phrasesAnnotating text corpus based on prosody,and constructing the text feature set of prosody boundary based on lexical and syntactic analysis of prosody phrase boundary Text features include: word vector features that express the relationship between words,similarity features based on syntactic structure analysis,and boundary position features of phrase structures,etc.(2)Acquisition of phonetic features at the boundary of prosodic phrasesAnnotate the speech corpus based on the real prosody,analyze the acoustic performance of the speech at and around the prosody boundary,and extract the acoustic features at the boundary of the prosody phrase from the speech audio to construct a speech feature set.The main features of the phonetics are: consonant duration,vowel duration,syllable duration,silentsegment duration,etc.(3)Prediction of rhythmic phrase boundaries combining text and speech featuresThe text features and the speech features are effectively fused to describe the boundary features of prosodic phrases from multiple levels and angles;two deep learning methods,Bi-LSTM-CRF and Bi-GRU-CRF,are used to construct the boundary prediction model and the prosodic phrases Automatic boundary prediction system.
Keywords/Search Tags:Text features, speech features, Bi-LSTM-CRF, Bi-GRU-CRF, Prosodic boundary
PDF Full Text Request
Related items