Linear transforms in automatic speech recognition: Estimation procedures and integration of diverse acoustic data | | Posted on:2007-06-12 | Degree:Ph.D | Type:Dissertation | | University:The Johns Hopkins University | Candidate:Tsakalidis, Stavros | Full Text:PDF | | GTID:1458390005981353 | Subject:Engineering | | Abstract/Summary: | PDF Full Text Request | | Linear transforms have been used extensively for both training and adaptation of Hidden Markov Model (HMM) based automatic speech recognition (ASR) systems. Two important applications of linear transforms in acoustic modeling are the decorrelation of the feature vector and the constrained adaptation of the acoustic models to the speaker, the channel, and the task.; Our focus in the first part of this talk is the development of training methods based on the Maximum Mutual Information (MMI) and the Maximum A Posteriori (MAP) criterion that estimate the parameters of the linear transforms. We integrate the discriminative linear transforms into the MMI estimation of the HMM parameters in an attempt to capture the correlation between the feature vector components. The transforms obtained under the MMI criterion are termed Discriminative Likelihood Linear Transforms (DLLT). Experimental results show that DLLT provides a discriminative estimation framework for feature normalization in HMM training for large vocabulary continuous speech recognition tasks that outperforms its Maximum Likelihood counterpart. Then, we propose a structural MAP estimation framework for feature-space transforms. Specifically, we formulate, based on MAP estimation, a Bayesian counterpart of the Maximum Likelihood Linear Transforms (MLLT). Prior density estimation issues are addressed by the use of a hierarchial tree structure in the transform parameter space.; In the second part we investigate the use of heterogeneous data sources for acoustic training. We propose an acoustic normalization procedure for enlarging an ASR acoustic training set with out-of-domain acoustic data. The approach is an application of model-based acoustic normalization techniques to map the out-of-domain feature space onto the in-domain data. A larger in-domain training set is created by effectively transforming the out-of-domain data before incorporation in training. We put the cross-corpus normalization procedure into practice by investigating the use of diverse Mandarin speech corpora for building a Mandarin Conversational Telephone Speech ASR system. Performance is measured by improvements on the in-domain test set. | | Keywords/Search Tags: | Linear transforms, Speech, Acoustic, Estimation, Training, HMM, Data, ASR | PDF Full Text Request | Related items |
| |
|