Font Size: a A A

A Study Of Mongolian Automatic Speech Recognition For Children

Posted on:2022-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiangFull Text:PDF
GTID:2518306509460034Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence technology and the support of large-scale data,speech recognition in simple scenes has shown satisfactory performance.However,large speech database only exists in several languages which are the most widely used.There is a shortage of sufficient data in most languages and many subdivisions.Automatic speech recognition of low resource remains to be studied.The research of Mongolian speech recognition for children has practical significance,which can meet the needs of children's voice interaction in intelligent education and other aspects.This dissertation focuses on the research of Mongolian speech recognition for children,and the specific content is as follows.In this dissertation,the children's Mongolian corpus is established.First,children's Mongolian speech are obtained on the Internet and processed.After automatic speech recognition,the resulting text is proofread to obtain the initial corpus.And this dissertation uses the methods of adding noise,transforming and speech synthesis to expand the corpus of children's Mongolian speech.This dissertation implements a Mongolian speech recognition baseline system for children based on the Time Delay Neural Network(TDNN).The baseline system based on TDNN model was established using the the Mongolian corpus of adult and children.Experiments have compared the modeling performance of chained TDNN and traditional TDNN,and the results show that the effect of chained model optimized at sequence level is better than that of traditional model optimized at single frame.This dissertation proposes a Mongolian speech recognition model for children based on transfer learning.In this dissertation,three transfer learning strategies are adopted to build the model,which are transfer learning based on adult Mongolian,transfer learning based on cross-language and multi-task learning mixed with different training criteria.The experimental results show that the model based on transfer learning can effectively optimize the performance of Mongolian speech recognition system for children,and the word error rate of the multi-task mixed model combined with different training criteria reaches the lowest 12.53%.On the basis of the tansferred model,the speaker normalization method and the speaker adaptive method are combined by making the Vocal Tract Length Normalisation(VTLN),and different features are used in the experiment to improve the performance of Mongolian speech recognition for children.Experimental results show that VTLN can effectively improve the performance of Mongolian speech recognition system for children,and the relative word error rate is reduced by 16.4%compared with the control experiment without feature transformation.
Keywords/Search Tags:Mongolian, transfer Learning, low resource, speech recognition for children, cross-language training
PDF Full Text Request
Related items