Speech recognition technology has been widely used in many fields such as human-computer interaction,intelligent home,intelligent medical treatment and intelligent transportation,bringing great convenience to people’s life and work.With the deepening of people’s understanding of local culture,dialect human-computer interaction has become one of the new research directions.Deep learning algorithm has been applied in Mandarin speech recognition system,and the recognition rate of Mandarin has been very high,but dialect recognition is still one of the difficulties.Due to the difficulty in collecting dialect data resources and the wide variety of dialects,it is difficult to find appropriate models to train and recognize them directly.Therefore,the research on speech recognition of local dialects is particularly important.Based on the corpus of Northern Shaanxi dialect and the deep neural network LSTM as the network structure model,this paper builds the speech recognition system of Northern Shaanxi dialect based on deep learning.The main work of this paper is as follows:(1)The basic principles and key technologies of speech recognition are discussed.Signal processing includes speech signal digitization,preprocessing process,speech feature parameter extraction.In the aspect of speech recognition model,the deep neural network structure and its algorithm in acoustic model,N-gram language model and decoding mode based on WFST are introduced in detail.(2)Establishment of corpus of Northern Shaanxi dialect.The pronunciation,vowel features and tone features of northern Shaanxi dialect are studied deeply.Twelve speakers were recruited to record dialects,and the specific process of establishing the corpus of northern Shaanxi dialect was proposed,including the selection of text corpus,the source of speech data,speech segmentation,marking,checking,etc.,which is the basis of dialect speech recognition research.(3)In terms of recognition algorithm,the characteristics of the traditional mixed DNN-HMM model and the modern end-to-end model are analyzed in detail,and the model test is carried out on the open data set AISHELL-1 of Mandarin corpus.The test results of the two models show that there is little difference in the accuracy of speech recognition.Considering the scarcity of dialect resources and the uniqueness of dialect speech,a mixed DNN-HMM model is selected for dialect speech recognition.By testing and analyzing the performance of CNN,GRU,LSTM and other DNN network structures,LSTM-HMM is chosen as the basic model.In order to maximize the performance of the model,two sets of comparison experiments were designed based on the bidirectional short and long time memory network model,and parameters such as the structure level,learning rate and sample training batch size of the LSTM model were adjusted.Finally,the training parameters suitable for the basic model were determined.(4)Establishment of dialect speech recognition system in Northern Shaanxi.With Kaldi,an open source speech recognition tool,using LSTM neural network based on HMM as the acoustic model and 3-gram model as the language model of dialect speech recognition model,the paper introduces in detail the environment configuration of the system,the construction of deep learning network,the preparation of experimental data,the training process of speech recognition model.Finally,the speech recognition system of northern Shaanxi dialect is implemented and the recognition results are analyzed.Experiments are designed to compare the recognition performance of the system under different acoustic models,language models and training sample data.In addition,the same data set is used to test on other dialect models such as GRU,DNN-HMM and CNN-HMM.The experiments show that the acoustic model based on LSTM network has the best recognition performance... |