Font Size: a A A

CORELEX: Systematic polysemy and underspecification

Posted on:1999-06-05Degree:Ph.DType:Thesis
University:Brandeis UniversityCandidate:Buitelaar, Peter PaulFull Text:PDF
GTID:2465390014973172Subject:Language
Abstract/Summary:PDF Full Text Request
This thesis is concerned with a unified approach to the systematic polysemy and underspecification of nouns. Systematic polysemy--senses that are systematically related and therefore predictable over classes of lexical items--is fundamentally different from homonymy--senses that are unrelated, non-systematic and therefore not predictable. At the same time, studies in discourse analysis show that lexical items are often left underspecified for a number of related senses. Clearly, there is a correspondence between these phenomena, the investigation of which is the topic of this thesis.;Acknowledging the systematic nature of polysemy and its relation to underspecified representations, allows one to structure ontologies for lexical semantic processing more efficiently, generating more appropriate interpretations within context. In order to achieve this, one needs a thorough analysis of systematic polysemy and underspecification on a large and useful scale. The thesis establishes an ontology and semantic database (C scOREL scEX) of 126 semantic types, covering around 40,000 nouns and defining a large number of systematic polysemous classes that are derived by a careful analysis of sense distributions in W scORDN scET. The semantic types are underspecified representations based on generative lexicon theory.;The representations are used in underspecified semantic tagging, addressing two problems in traditional semantic tagging: sense enumeration (the difficulty on deciding the number of discrete senses), due to systematic polysemy; and multiple reference (NP's denoting more than one model-theoretic referent), due to underspecification. Also, traditional semantic tags that are based on discrete senses tend to be too fine-grained for practical use. For instance, W scORDN scET has, in principle, around 60,000 different tags (synsets) for nouns alone. The C scOREL scEX approach, on the other hand, offers a concise set of 126 tags that are inherently more coarse-grained, by taking into account systematic polysemy and underspecification.;Underspecified semantic tagging is implemented, using probabilistic classification in order to cover unknown nouns (not in C scOREL scEX) and to identify context-specific and new interpretations. The classification algorithm is centered around the computation of a Jaccard (similarity) score that compares lexical items in terms of the attributes (linguistic patterns acquired from domain specific corpora) they share.
Keywords/Search Tags:Systematic polysemy, Semantic, Nouns, Lexical
PDF Full Text Request
Related items