Font Size: a A A

Hybrid methods for acquisition of lexical information: The case for verbs

Posted on:2010-08-01Degree:Ph.DType:Thesis
University:The Ohio State UniversityCandidate:Li, JianguoFull Text:PDF
GTID:2445390002471836Subject:Language
Abstract/Summary:
Improved automatic text understanding requires detailed linguistic information about the words that comprise the text. Particularly crucial is the knowledge about predicates, typically verbs, which communicate both the event being expressed and how participants are related to the event. Although the field of natural language processing (NLP) has yet to develop a clear consensus on guidelines for building a verb lexicon suitable for applications in NLP, class-based construction of verb lexicons (e.g. Levin verb classification) with explicitly stated syntactic and semantic information has proved beneficial to a wide range of NLP tasks in combating the pervasive problem of data sparsity and increasing coverage. Such broad coverage dictionaries and ontologies are difficult and costly to create and maintain by hand, it is therefore desirable to learn them from distributional information, such as can be obtained from unlabeled or sparsely labeled text corpora. To this end, this thesis will primarily address the following three questions:;First, deriving Levin-style verb classifications from text corpora helps avoid the expensive hand-coding of such information, but appropriate features must be identified and demonstrated to be effective. One of our primary goals is to assess the linguistic conditions which are crucial for lexical classification of verbs. In particular, we experiment with different ways of mixing syntactic and lexical information for improved verb classification. Second, Levin verb classification provides a systematic account of verb polysemy. We propose a class-based method for disambiguating Levin verbs using only untagged data. The basic working hypothesis is that verbs in the same Levin class tend to share their subcategorization patterns as well as neighboring words. In practice, information about unambiguous verbs in a particular Levin class is employed to disambiguate the ambiguous ones in the same class. Last, automatically created verb classifications are likely to deviate from manually crafted ones, therefore it is of great importance to understand whether automatically created verb classifications can benefit the wider NLP community. We propose to integrate verb class information, automatically learned from text corpora, into a particular parsing task, PP-attachment disambiguation.
Keywords/Search Tags:Information, Text, Verbs, Particular, Lexical, NLP, Class
Related items