Font Size: a A A

Transformation based learning and data-driven lexical disambiguation: Syntactic and semantic ambiguity resolution

Posted on:2004-05-21Degree:Ph.DType:Dissertation
University:The Johns Hopkins UniversityCandidate:Florian, RaduFull Text:PDF
GTID:1465390011973727Subject:Computer Science
Abstract/Summary:
This dissertation presents both a comprehensive theoretical and empirical study of, and numerous original algorithmic contributions to, the framework of transformation based machine learning. It is focused on the problem domain of natural language ambiguity resolution, and also includes a comprehensive original empirical study of the phenomena of lexical ambiguity resolution, spanning diverse parameter spaces, languages, tasks and learning algorithms.; Resolving the ubiquitous ambiguity found in human language is a central task in the natural language processing field. Whether the ambiguity is syntactic (e.g. a word having different part-of-speech functions) or semantic (e.g. a word having different senses), the disambiguation process is a central first step in most language processing tasks, such as machine translation, information retrieval or question answering.; This dissertation presents a broad survey of statistical machine learning techniques for multilingual lexical syntactic and semantic disambiguation. It presents several original and empirically successful data-driven machine learning algorithms in the transformation-based learning (TBL) framework, including a fast feature-template-based learning framework for TBL (fnTBL), multi-task joint modeling for fnTBL, the incorporation of productive redundancy into fnTBL models, augmenting fnTBL learning via the forward-backward algorithm, and several original approaches to classifier combination and minimally supervised learning. It also includes a theoretical investigation into the foundational model of TBL, decision pylons, their representational power, and their efficient learnability.; The target tasks investigated here include multilingual named entity recognition, part-of-speech tagging, text chunking, word segmentation and word-sense disambiguation, over a diverse set of languages: Basque, Chinese, Czech, Dutch, English, Estonian, Italian, Spanish, and Swedish. The demonstrated results obtained highly competitive performance in international system bake-offs: first place (out of 36 participating systems) in 6 out of 7 tested languages in the SENSEVAL2 word sense disambiguation evaluation, and second place (out of 12 systems) in the CoNLL'02 shared task on multilingual named entity recognition.
Keywords/Search Tags:Disambiguation, Ambiguity, Lexical, Syntactic, Semantic, Original
Related items