Font Size: a A A

Statistical semantics of phrases in hierarchical contexts

Posted on:1995-12-03Degree:Ph.DType:Dissertation
University:University of California, San DiegoCandidate:Steier, Amy MarieFull Text:PDF
GTID:1475390014490006Subject:Computer Science
Abstract/Summary:
The research in this dissertation is centered around the study of phrasal semantics in varying corpora contexts. We embrace a statistical approach to semantics in which corpora statistics are used to characterize meaning. The linguistic issues we concentrate on are the phenomena of semantic compositionality and collocational constraints. The semantic compositionality of a phrase is the extent to which the meaning of the phrase is inferred from the typical meanings associated with its constituents. For example, the meaning of the phrase DATABASE SOFTWARE is highly compositional, whereas the meaning of the phrase BULL'S EYE or POLITICALLY CORRECT is highly noncompositional. Collocational constraints are restrictions in language that determine which words can co-occur with which other words. Not all collocational constraints exist between words that are syntactically modifying each other.; We study the interaction between collocational constraints and phrasal indexing by performing an in-depth comparison of syntactic and statistical approaches to phrasal indexing in Information Retrieval. We study the issue of compositionality in statistical semantics by first determining how effective various statistical measures are at measuring semantic compositionality in phrases. We then use these same measures to study whether the semantic compositionality of a phrase can be used to determine the effectiveness of the phrase and its constituents as index terms.; Another theme of our research is the effect of varying corpora contexts on statistical approaches to semantics. Text collections are often heterogeneous; the larger a collection is, the more likely it has been generated by many different authors using very different vocabularies and covering many different topics. Therefore, global statistics can become problematic if local variations in word usage patterns are lost. One particular research issue we focus on is how the semantic compositionality of a phrase can vary across the subcontexts of a larger, more heterogeneous collection. We also study the effect of global versus local statistics on performance in Information Retrieval. By global, we mean statistics computed using the entire collection; by local, we mean statistics computed using a smaller, more topically cohesive subset of the collection. All together, this research has furthered our understanding of the statistical semantics of phrases in hierarchical contexts.
Keywords/Search Tags:Semantics, Statistical, Phrase, Contexts, Collocational constraints, Collection
Related items