| With the maturity of deep learning technology,scholars combine deep learning with acoustic model to improve the recognition rate greatly.At the same time,it provides a new direction for the research of dialect recognition.As one of the most widely used dialects in Hunan,Changsha dialect contains profound and meaningful Hu Xiang Culture.However,there are few researches on the recognition of Changsha dialect with large vocabulary and continuous.Therefore,the research on Changsha dialect recognition is very valuable.Traditional hidden markov model(HMM)require frame level alignment annotation of the speech,which is expensive to do manually.Different from HMM,Connectionist Temporal Classification(CTC)can directly map speech features to text without artificial frame-level training alignment,which makes the training process more concise and efficient.Gated recurrent neural(GRU)can capture the dependence of time distance in speech signal,and has certain dynamic memory ability,which can effectively solve the problems of gradient vanishing and gradient explosion.With the introduction of Mutihead Attention(MA),the acoustic model can pay more attention to the features which are most relevant to the current recognition problems,and further improve the recognit ion rate of Changsha dialect.Based on MA-GRU-CTC model,Changsha dialect recognition is studied in this paper.The main contents are as follows:1.In view of the lack of Changsha dialect data sets with large vocabulary and continuous.This paper studies the pronunciation characteristics of Changsha dialect,constructs a 3.79-hour Changsha dialect data set,and builds the Changsha dialect dictionary,which provides the Corpus support for the following recognition problems.2.Aiming at the problem of high cost and poor effect of traditional acoustic model in large vocabulary and continuous speech recognition.Compared with the traditional deep learning model DNN-CTC,the recognition rate of GRU-CTC is improved by 33%.In order to further improve the recognition rate of GRU-CTC,Ma is introduced,and the recognition rate of GRU-CTC is improved by nearly 1/4.3.The small volume of self-built Changsha dialect data may affect the performance of the model.In this paper,we compare three Chinese Mandarin speech data sets with us,using the same model and the same parameters,and find that the recognition rate on the Chinese data set is higher than that of Changsha dialect data set. |