Linear transforms in automatic speech recognition: Estimation procedures and integration of diverse acoustic data

Posted on:2007-06-12

Degree:Ph.D

Type:Dissertation

University:The Johns Hopkins University

Candidate:Tsakalidis, Stavros

Full Text:PDF

GTID:1458390005981353

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Linear transforms have been used extensively for both training and adaptation of Hidden Markov Model (HMM) based automatic speech recognition (ASR) systems. Two important applications of linear transforms in acoustic modeling are the decorrelation of the feature vector and the constrained adaptation of the acoustic models to the speaker, the channel, and the task.; Our focus in the first part of this talk is the development of training methods based on the Maximum Mutual Information (MMI) and the Maximum A Posteriori (MAP) criterion that estimate the parameters of the linear transforms. We integrate the discriminative linear transforms into the MMI estimation of the HMM parameters in an attempt to capture the correlation between the feature vector components. The transforms obtained under the MMI criterion are termed Discriminative Likelihood Linear Transforms (DLLT). Experimental results show that DLLT provides a discriminative estimation framework for feature normalization in HMM training for large vocabulary continuous speech recognition tasks that outperforms its Maximum Likelihood counterpart. Then, we propose a structural MAP estimation framework for feature-space transforms. Specifically, we formulate, based on MAP estimation, a Bayesian counterpart of the Maximum Likelihood Linear Transforms (MLLT). Prior density estimation issues are addressed by the use of a hierarchial tree structure in the transform parameter space.; In the second part we investigate the use of heterogeneous data sources for acoustic training. We propose an acoustic normalization procedure for enlarging an ASR acoustic training set with out-of-domain acoustic data. The approach is an application of model-based acoustic normalization techniques to map the out-of-domain feature space onto the in-domain data. A larger in-domain training set is created by effectively transforming the out-of-domain data before incorporation in training. We put the cross-corpus normalization procedure into practice by investigating the use of diverse Mandarin speech corpora for building a Mandarin Conversational Telephone Speech ASR system. Performance is measured by improvements on the in-domain test set.

Keywords/Search Tags:

Linear transforms, Speech, Acoustic, Estimation, Training, HMM, Data, ASR

PDF Full Text Request

Related items

1	Research On Discriminative Techniques Of Feature Extraction And Acoustic Model Training In Continuous Speech Recognition
2	Research On Acoustic Modeling For Spontaneous Spoken Speech Recognition
3	The Design And Software Implementation Of Acoustic Model Training System Platform
4	Discriminative Training For Continuous Speech Recognition
5	Discriminative Training Of Acoustic Models For Automatic Speech Recognition
6	Research On Acoustic Model Of Speech Recognition In Educational Scene Based On Deep Learning
7	Mongolian Language Oriented Research On Acoustic Modeling For Speech Recognition
8	Acoustic Modeling For Continuous Speech Recognition
9	Research On I-vector Based Speaker Normalization For Speech Recognition
10	Large margin training of acoustic models for speech recognition