| Dai word segmentation is the basis and premise of language speech synthesis. Dai speech synthesis system mainly includes front-end text analysis and back-end speech synthesis. Dai automatic word segmentation is an essential part of the front-end text analysis, the results of which directly influences the naturalness and intelligibility of synthetic speech.Therefore, the main purpose of the paper is to improve the Dai language word accuracy. Research on the use of machine learning models makes ready for the back-end high-quality speech synthesis. The major work includes:Describes the basic flow of speech synthesis system and the important role of automatic segmentation in TTS system. Summarized the research progress of automatic word segmentation technology and discusses on the basis of segmentation method based on machine learning. Points out the respective advantages and disadvantages.Using Naive Bayes, Decision Trees and CRFs (Conditional random fields) as classifiers, their classification principle are introduced respectively. On the basis of probabilistic graphical models and maximum entropy Markov highlighted on CRFs and explains their advantages and disadvantages.Establishes Dai N-gram language model based on characters. The paper is based on the Dai character attributes as well as the contextual information, achieves the Dai word segmentation, including the design of the character attribute set, the context feature extraction, the selection boundary markers.Describes the experimental platform and evaluation criteria. The different set of sub-word boundary markers is carried out and the experiment that is the N different values for the overall performance of the classifier is conducted.The results show that:(1) the word segmentation accuracy rate of CRFs model classifier is highest, indicating that its performance is excellent.(2) With the gradual increase in the value of N, the classifier performance will be improvement, to a certain extent. For CRFs, it can meet the practical conditions when N=4.(3) For Dai word segmentation of a late start and of little depth, word accuracy cannot satisfy practical requirements. |