Syntactic form and discourse function in natural language generation

Posted on:2004-09-23

Degree:Ph.D

Type:Dissertation

University:University of Pennsylvania

Candidate:Creswell, Cassandre Yvonne

Full Text:PDF

GTID:1465390011972095

Subject:Language

Abstract/Summary:

Previous research has shown that certain discourse conditions are necessary for the felicitous use of four non-canonical syntactic constructions in English, topicalizations, left-dislocations, wh-clefts, and it-clefts. However, the distribution of these forms does not correlate one-to-one with the presence of these necessary conditions. Speakers must choose to use these constructions for other reasons. Additionally, a natural language generation algorithm that selects these statistically-rare forms based only on these conditions will overgenerate. If it selects clausal word order based only on frequency, however, these forms will never be selected or will be used in meaningless ways. The purpose of this dissertation is to devise a more complete model of when human speakers generate these constructions in order to further understanding of syntactic form selection and to better characterize these forms' conditions of use for purposes of NLG. The model of syntactic choice presented explicitly ties the goals of the communicative agent to the linguistic forms selected to achieve those goals. Three types of communicative goals that speakers achieve through the use of non-canonical syntax are argued for (1) attention marking, (2) discourse relation, and (3) information-structure focus disambiguation. The evidence supporting the model is based on naturally-occurring tokens from a corpus of spontaneous oral discourse. This same corpus, annotated with low-level properties of the discourse context surrounding utterances with non-canonical word order, is then used to train a statistical model that can approximate some aspects of the theoretical model. The statistical model supports the claim that communicative goals of signaling discourse relations do correlate significantly with the use of particular non-canonical forms. The statistical model is also used as a probabilistic classifier, which could be utilized as a stochastic method for selecting syntactic form based on discourse context as part of a natural language generation system. The probabilistic classifier shows improvement over a naive classifier when applied to training data. The probabilistic classifier is a first attempt to utilize more than just frequency counts as a basis for syntactic form selection and instead incorporate aspects of the semantic content of surrounding discourse context as a basis for using a particular form.

Keywords/Search Tags:

Discourse, Syntactic, Form, Natural language, Non-canonical, Conditions

Related items

1	Study On Syntactic Conditions And Grammaticalization Of "X Yilai, Y"
2	Give And Take Semantic Category In Mandarin Chinese
3	Research On The Expression And Application Of Natural Form Language In Ceramic Creation
4	An Approach To The Clustering Of Discourse Causality In Natural Language Understanding And Its Application In Mechanical Design
5	A Study On The Natural Form In The Landscape Painting Of Changbai Mountain
6	Syntactic And Discourse:Rules Of Entity First-mention And Cause Analysis Of Indefinite Construction In Chinese FIS Discourse
7	Evolvement From Natural Form To Artistic Form
8	The Expression Of Natural Form Language In The Series Of "Everything In Itself"
9	Based On Natural Forms Of Visual Language Studies In Graphic Design
10	Graphical Transformation Of Natural Forms