Concept classification with application to speech to speech translation

Posted on:2012-08-24

Degree:Ph.D

Type:Dissertation

University:University of Southern California

Candidate:Ettelaie, Emil

Full Text:PDF

GTID:1465390011958500

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

The central goal in interactive speech-to-speech translation applications is to facilitate the accurate exchange of the semantic content (or "concept") of the speech between the interlocutors rather than producing word-by-word literal translation of the source utterance. While the conventional Statistical Machine Translation (SMT) methods are mainly developed and optimized for translating text, speech understanding through concept classification offers a possible way of translation in speech-to-speech translation systems that suits the above purpose. A correct concept classification offers the promise of obtaining well-formed target language speech output, although it cannot accurately cover the entire dialog domain due to the limited number of concept classes.;Here, the task of spoken utterance classification is presented as a MAP estimation problem. Formulation of the understanding model and data collection methods are presented. To cope with the inherent sparsity in the training data, the use of a background model is introduced and its effects are investigated. For further sparsity mitigation, a new method of lexical enhancement by using an SMT system is introduced.;To improve the overall accuracy of the classification task, a method for incorporating contextual information is also presented. Specifically, for a two-way speech translation system, a classification scheme is derived that utilizes these information from both sides of the conversation through a dialog model. Empirical results show that the proposed dialog model provided a modest improvement in classification accuracy while a significant improvement in the accuracy of rejection task.;The main bottleneck in achieving an acceptable performance with concept classifiers is the tedious task of annotating large amounts of training data. Any attempt to develop a method to assist in, or to completely automate, data annotation should involve the clustering of sentences based on the meaning they convey. This needs a distance measure to compare sentences in the concept level. Here, a new method of sentence comparison is introduced that is motivated from the translation point of view. In this method the imperfect translations produced by a phrase-based SMT system are used to compare the concepts of the source sentences. The distances among the utterances are measured using two alternative type of metrics. The first metric is aimed to capture local dependencies among words based on the Markov chain modeling of the text (Language Models). The second metric is computed based on the word associations in a wider view. Such associations are learned through Topic Modeling of the translation lists of the data utterances. Two clustering methods are adapted to support the concept-base distance. Experimental evaluations show the effectiveness of the proposed methods.;The effectiveness of concept classifiers depends on the size of the domain that they cover. An obstacle in expanding the classifier domain, however, is the degradation in accuracy as the number of classes increase. A hierarchical classification process that aims to scale up the domain without compromising the accuracy is introduced here. This method exploits the categorical associations that naturally appear in the training data, to split the domain into sub-domains with fewer classes. In a two-layered structure, first the best category for the discourse is detected and then a sub-domain classifier---limited to that category---is deployed. For category detection the discourse information is used as input. For that purpose, two alternative methods based on language models and topic modeling are introduced. Results from experiments show higher accuracy for the proposed method compared to a single layered classifier.

Keywords/Search Tags:

Concept, Translation, Speech, Classification, Method, Accuracy, Introduced, Model

PDF Full Text Request

Related items

1	Methods Of Achieving The Accuracy Of Term Translation
2	An Experimental Study On Accuracy Of Machine Speech Translation
3	The Interval Estimation Method Of Attribute Classification Consistency In Cognitive Diagnosis And Applicationin Model Misspecification
4	A Practice Report On The Translation Of Oil And Gas Production Handbook(Excerpt)
5	Research On Chinese Music Emotion Classification Based On Lyrics And Comments
6	A Study Of The Impact Of Automatic Speech Recognition On The Accuracy Of Number Interpretation In E-C Simultaneous Interpreting
7	Research On The Identification Of Sui Language
8	Research And Implementation Of Tibetan Text Classification Based On Ada Boost Model
9	Application Of Free Translation Method And Related Techniques In Translation Practice: Reflections On The Translation Of Changing The Channel
10	An Inquiry Into The Impact Of Speech Rates On The Performance Of Simultaneous Interpreters