Font Size: a A A

Research On Speech Synthesis Technology In Human-Computer Interacting System

Posted on:2022-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhuFull Text:PDF
GTID:2518306338478494Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
In recent years,as smart devices have become more and more integrated into daily life,speech synthesis has been widely used in various scenarios.With the development of artificial intelligence and computer technology,the effect of speech synthesis systems has gradually improved,but the existing speech The speech synthesized by the synthesis system has a considerable gap in the naturalness and intelligibility of the speech from the lively and colorful human language,and the existing speech synthesis algorithm has a complex structure,which greatly limits its own application scenarios..Therefore,in the field of human-computer interaction,the research of speech synthesis technology is an important topic.The thesis focuses on the end-to-end speech synthesis method.In the speech synthesis task,the speech naturalness generated by the existing algorithm is not good and the structure of the vocoder is complicated,and the research work is carried out from the optimization of the speech synthesis vocoder.The main research contents of the thesis are as follows:First,the thesis elaborates and analyzes the basic theories in the field of Chinese speech synthesis,including the characteristics of Chinese speech,several mainstream speech synthesis algorithms and related principles,and introduces several commonly used indicators and evaluations for evaluating the quality of synthesized speech.method.Then,aiming at the problems of poor prosody and low naturalness in the Chinese speech synthesis algorithm,the BoTNet network is used to improve the WaveNet vocoder.The BoTNet network contains a self-attention mechanism,which can realize the interaction between pairs of data based on the content-based addressing mechanism,achieve the purpose of learning the complex correlation feature levels of long-span sequences,and enhance the construction of long-span dependent information.Modulation capabilities,thereby improving the performance of the Chinese speech synthesis system.And the proposed B-WaveNet vocoder algorithm reduces the amount of model parameters and speeds up the calculation.The results of simulation experiments and voice listening tests have verified the effectiveness of the vocoder.Finally,aiming at the high complexity of the time-domain speech synthesizer in the speech synthesis algorithm,a sub-band-based speech synthesizer model is proposed.The model first uses multi-level wavelet decomposition and merging to decompose or reconstruct the signal into sub-band signals in the time domain,and then uses the speech dictionary to take linguistic features as conditional features to implement a sub-band-based speech synthesizer.Take advantage of the small bandwidth of the subband signal to reduce the complexity of the time-domain speech synthesizer.The results of simulation experiments and voice listening tests have verified the effectiveness of the vocoder.
Keywords/Search Tags:human-computer interaction, Chinese speech synthesis, vocoder, WaveNet, BoTNet
PDF Full Text Request
Related items