Font Size: a A A

A CCG-Based Method for Training a Semantic Role Labeler in the Absence of Explicit Syntactic Training Data

Posted on:2012-07-23Degree:Ph.DType:Dissertation
University:The Ohio State UniversityCandidate:Boxwell, Stephen AFull Text:PDF
GTID:1465390011467410Subject:Language
Abstract/Summary:
Treebanks are a necessary prerequisite for many NLP tasks, including, but not limited to, semantic role labeling. For many languages, however, treebanks are either nonexistent or too small to be useful. Time-critical applications may require rapid deployment of natural language software for a new critical language---much faster than the development time of a traditional treebank. This dissertation describes a method for generating a treebank and training syntactic and semantic models using only semantic training information---that is, no human-annotated syntactic training data whatsoever. This will greatly increase the speed of development of natural language tools for new critical languages in exchange for a modest drop in overall accuracy. Using Combinatory Categorial Grammar (CCG) in concert with Propbank semantic role annotations allows us to accurately predict lexical categories in combination with a partially hidden Markov model. By training the Berkeley parser on our generated syntactic data, we can achieve SRL performance of 65.5% without using a treebank, as opposed to 74% using the same feature set with gold-standard data.
Keywords/Search Tags:Semantic role, Training, Data, Syntactic, Treebank, Using
Related items