With the advent of the era of artificial intelligence,human-computer interaction has gradually shifted from contact communication to voice interaction.Some voice products have achieved simple communication with humans.Moreover,communication has developed from such major languages as English and Chinese to minority languages like Uyghur and Cantonese.Especially,Hmong language accounts for a large proportion of minority languages.At present,domestic and foreign scholars have done some research on the collection,comparison and vowel analysis Hmong language speech,but researches on fundamental methods of Hmong language(speech recognition,speech synthesis)are handful.Therefore,this article attaches great importance to the Hmong language in the central part of Guizhou Province as the research object,and researches on the construction of phoneme-level speech corpus,automatic phoneme boundary annotation and phoneme recognition,respectively.The primary work is as follows:Firstly,based on the basically recorded data of Hmong speech,this paper establishes a Hmong speech database that can be used for acoustic research.Having reviewed the reference of Chinese vocabulary,2000 phonetic data commonly used in the daily life of Hmong people are recorded.Meanwhile,the phonemes,tones and word meanings of the speech are also marked.It achieves the precise phoneme segmentation of voice data and effective classification of the similar phoneme,generating a standard phoneme data.Secondly,on the basis of the structural similarity of the phoneme spectrum energy as well as the large variation of different phoneme spectrum energy,this paper designs an automatic phoneme boundary detection method,which through pre-processing,sub-frame and FFT transformation of speech,the energy value of each frame of speech in the range of 0Hz-5000 Hz is extracted for analysis.Furthermore,in order to further decrease the impact of noise on the energy structure of the spectrum,this paper offers abinarization method of the spectrum energy according to the energy value of the spectrum,and combines the data of the same frame in different frequency bands in an average way to record the spectral data for dimensionality reduction.At the same time,Euclidean distance is used to compare adjacent data in each dimension,and candidate boundaries are determined according to the rules of voice.Finally,the final boundary is obtained by further merging and filtering the dimensionality reduction data.This method is tested in the Hmong language database constructed in this paper,and which can achieve a 90.9% accuracy rate within a fault tolerance range of 40 ms.Thirdly,recognize the obtained phonemes by the automatic boundary detection method.In this paper,the HMM method is used as the main recognition algorithm.After preprocessing the phoneme speech data,the 24-dimensional MFCC parameters are extracted as the input features of the method.Select 300 standard split phoneme speech data to train HMM,and then use 100 standard split phonemes to test,the accuracy rate is 76.3%.Finally,the phonemes obtained by the automatic boundary detection method are tested on HMM.When the fault tolerance time is 40 ms,the accuracy is 72.5%. |