Research On Internal Language Model Elimination For End-to-end Automatic Speech Recognition System

Posted on:2023-11-25

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Liu

Full Text:PDF

GTID:2545306830486414

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of the internet industry,speech recognition technology is used more and more widely in our life.Due to the simplicity and effectiveness of the attention-based encoder-decoder end-to-end ASR model(AED model),it has been widely used and is becoming the focus of researchers.But AED models are inherently limited in taking full advantage of an external language model(LM)that is trained on a much larger text-only data.When fused with an external language model the accuracy improvement of it is usually as low as 10% while a traditional ASR model is usually 30%-40%.This is not only because the AED model is already much more accurate than the traditional ASR model,but also because the AED model inevitably learns a biased internal language model(ILM)during training.Theoretically better recognition accuracy can be achieved by removing the impact of ILM when fused with an external language model.The difficulty lies in that we cannot easily estimate the ILM because it hides inside the AED model.Many methods have been proposed to estimate the implicit ILM.One of the most effective methods is proposed by Microsoft called Zero-out.But it can only estimate the ILM from AED models with BLSTM encoder and reduce the WER of the fused model.It cannot be applied to all kinds of AED models because of mismatch.In addition,the estimation and subtraction operations require much more computational resources than the traditional shallow fusion method during inference.Thus,despite the significant accuracy improvement,they are not widely used in industry.To deal with these issues,the main research contents and contributions of this thesis are as follows:1.We find that the Zero-out method is not suitable for certain types of AED models due to the mismatch problem.We proposed two training-based methods to solve it.So,the proposed methods can be applied to any type of AED model.Experimental results demonstrated that the proposed methods consistently outperform the Zero-out method.The proposed method can reduce WER by up to 33% relatively in the Librispeech test dataset.2.To reduce the computational requirement during inference.We proposed a training method to directly eliminate the linguistic bias during training instead of subtracting it during inference.This is achieved by using adversarial learning.The model trained with the proposed method will have less linguistic bias and can achieve comparative accuracy with much less computational resource.

Keywords/Search Tags:

Auto Speech Recognition, Attention Mechanisms, Internal Language Model Estimation, Language Model Fusion

PDF Full Text Request

Related items

1	Research And Implementation Of Tibetan Language Model Based On RNN
2	Research On Mongolian Online Handwriting Recognition Based On Fusion Model
3	The Research On Tibetan Speech Recognition Technology
4	Research On The Identification Of Sui Language
5	Mongolian Named Entity Recognition Integrated Language Model And Attention Mechanism
6	Application Of Henan’s Dialect Speech Recognition Based On Acoustic Model
7	Parameter Estimation In The High-dimensional Cognitive Diagnosis Model
8	A Computational Model Of Attention In CALL Reading For Young Second Language Learners
9	Resource and bottleneck mechanisms of attention in language performance
10	Research On Music Emotion Recognition Based On Multi-Level Features And Fusion Model