| Speech synthesis is one of the key technologies of human-computer interaction system.As an important research direction in the field of speech,speech synthesis technology has broad application prospects in the fields of intelligent navigation,robots,intelligent reading,and smart tourism.The research of speech synthesis originated from the 18 th to the 19 th century,and developed from mechanical and electronic to speech synthesis based on unit splicing and statistical parameters.In recent years,modeling methods based on neural networks and deep learning have achieved rapid development in the field of machine learning,and speech synthesis technology has also been significantly improved on this basis.End-to-end speech synthesis methods have emerged as the times require.The advantage of the model is that the model can simply learn the correspondence between the input text and the target speech without extracting potential features in advance.The end-to-end speech synthesis system directly inputs text and outputs speech,showing a good synthesis effect.At present,the end-to-end speech synthesis system usually uses the neural network vocoder as a back-end module of speech synthesis.The neural network vocoder better reconstructs the phase information in the speech than the traditional filter vocoder to achieve high quality synthesis voice.However,it generates speech through neural network modeling speech sampling process,the complexity is too high,and the speed of synthesizing speech is slow.In order to solve the above problems,this paper proposes a speech synthesis method based on Linear Predictive Coding Network(LPCNet)model.First,the Chinese tone-tuned pinyin sequence is used as input,and then the Seq2Seq(Sequence to sequence)feature prediction network that introduces the self attention mechanism is used to generate a mel spectroscopy of the corresponding speech.Finally,the LPCNet model is used to convert the mel spectroscopy.The picture is restored to voice.Experimental results show that the quality of synthesized speech is better than the parametric speech synthesis model and the Seq2 Seq speech synthesis model using traditional vocoders,which is an excellent method of speech synthesis.In addition,this method is applied to the speech synthesis of Shanxi Datong dialect,and the research on dialect speech synthesis has been done.Collected and established a standardized Datong dialect speech data set,and pre-processed and Zhuyin annotation.In order to solve the problem of poor quality of synthesized speech due to the small amount of data in Datong dialect,the speaker adaptive training method was used to effectively solve the problem,and finally speech synthesis for Datong dialect was realized. |