A Study Of Mongolian Automatic Speech Recognition For Children

Posted on:2022-10-31

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Liang

Full Text:PDF

GTID:2518306509460034

Subject:Computer Science and Technology

Abstract/Summary:

With the development of artificial intelligence technology and the support of large-scale data,speech recognition in simple scenes has shown satisfactory performance.However,large speech database only exists in several languages which are the most widely used.There is a shortage of sufficient data in most languages and many subdivisions.Automatic speech recognition of low resource remains to be studied.The research of Mongolian speech recognition for children has practical significance,which can meet the needs of children’s voice interaction in intelligent education and other aspects.This dissertation focuses on the research of Mongolian speech recognition for children,and the specific content is as follows.In this dissertation,the children’s Mongolian corpus is established.First,children’s Mongolian speech are obtained on the Internet and processed.After automatic speech recognition,the resulting text is proofread to obtain the initial corpus.And this dissertation uses the methods of adding noise,transforming and speech synthesis to expand the corpus of children’s Mongolian speech.This dissertation implements a Mongolian speech recognition baseline system for children based on the Time Delay Neural Network(TDNN).The baseline system based on TDNN model was established using the the Mongolian corpus of adult and children.Experiments have compared the modeling performance of chained TDNN and traditional TDNN,and the results show that the effect of chained model optimized at sequence level is better than that of traditional model optimized at single frame.This dissertation proposes a Mongolian speech recognition model for children based on transfer learning.In this dissertation,three transfer learning strategies are adopted to build the model,which are transfer learning based on adult Mongolian,transfer learning based on cross-language and multi-task learning mixed with different training criteria.The experimental results show that the model based on transfer learning can effectively optimize the performance of Mongolian speech recognition system for children,and the word error rate of the multi-task mixed model combined with different training criteria reaches the lowest 12.53%.On the basis of the tansferred model,the speaker normalization method and the speaker adaptive method are combined by making the Vocal Tract Length Normalisation(VTLN),and different features are used in the experiment to improve the performance of Mongolian speech recognition for children.Experimental results show that VTLN can effectively improve the performance of Mongolian speech recognition system for children,and the relative word error rate is reduced by 16.4%compared with the control experiment without feature transformation.

Keywords/Search Tags:

Mongolian, transfer Learning, low resource, speech recognition for children, cross-language training

Related items

1	Research On Cross-language Tibetan Lhasa Speech Recognition Based On Transfer Learnin
2	Cross-language End-to-end Speech Recognition Research For Endangered Language
3	Research On Transfer Learning For Khalkha Mongolian Speech Recognition Acoustic Model
4	Research And Implementation Of Mongolian-Chinese Mixed Language Speech Recognition System Based On Deep Learning
5	A Study On Low-resource Multilingual Speech Recognition Based On Transfer Learning
6	Mongolian Language Oriented Research On Acoustic Modeling For Speech Recognition
7	The Research And Implementation Of A Speech Separation Method Based On Domain Adversarial Training
8	Research On Mongolian Speech Recognition Model Training Based On Semi-Supervised Learning
9	Research On Unsupervised Named Entity Recognition Based On Cross Language Pretrained Model
10	Research On Low-Resource Speech Recognition Based On The Transfer Learning And Fusion Of Language Models