Font Size: a A A

Research On Application Of Data Augmentation Based On Different Speech Habits In Speech Recognition In Telephone Scene

Posted on:2022-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:L F LiFull Text:PDF
GTID:2518306551954139Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In recent years,the rapid development of deep learning technology has made great progress in the accuracy of speech recognition,which has been implemented in many industries,and more and more speech technology products have entered people’s lives.Intelligent customer service robots are gradually entering customer service positions to help enterprises reduce labor costs and improve work efficiency.Intelligent customer service robot integrates multiple intelligent interactive technologies such as speech recognition,natural language processing and text to speech,which can accurately understand users’ intentions or questions and give users satisfactory answers.In the telephone scenes commonly used by intelligent customer service robots,users’ pronunciation habits are quite different,which makes the general speech recognition system unable to achieve good recognition accuracy.As automatic speech recognition system is data-driven,its performance is greatly influenced by the scale and field coverage of the training data.Less training data and great differences in pronunciation habits will seriously affect the recognition accuracy.One way to solve these problems is through data augment.This thesis studies the influence of data augment methods based on different speech habits on the recognition accuracy of speech recognition model in mobile phone channel and telephone channel.The main tasks and innovations are as follows:Firstly,the training process of traditional speech recognition system is introduced in detail,including feature extraction,acoustic model,language model,evaluation criterion,etc.,and a baseline model of speech recognition is built using Kaldi speech recognition tool.This thesis introduces the data augment method based on peed perturbation commonly used in speech recognition model training and carries out experiments.The results show that the data augment method based on peed perturbation can improve the recognition effect of the model in mobile phone channel and telephone channel.Secondly,according to the implement method of the data augment method based on peed perturbation,the method is divided into two kinds of data augment methods:tempo perturbation and pitch perturbation,and their implement method is introduced in detail.The effects of the two data augment methods on the model are compared through experiments,and the data augment methods of tempo perturbation and pitch perturbation mixing are proposed.The results show that tempo perturbation and pitch perturbation mixing data augment method is better than the speed perturbation method in mobile phone channel and telephone channel.Thirdly,aiming at the possible problems in data augment training,a training method combining model parameter pre-training and data augment is proposed.In this thesis,a variety of pre-training fine-tuning methods are experimented.The experimental results show that the fusion method of model parameter pre-training and data augment is better than simple data augment.
Keywords/Search Tags:data augmentation, pre-training, speech recognition, deep neural network
PDF Full Text Request
Related items