| By introducing achievements made in the annotation of English discourse structure, this dissertation aims to evoke Chinese counterparts to do similar research.The review of the studies of the annotation of English discourse structure shows that the first step in annotating discourse structure is to construct a discourse theory, which should be grounded on both grammar and semantics. Grammar theory is responsible for the forms in discourse, such as segmentation of discourse, while semantics deals with the meaning in discourse, like relations between discourse segments. According to Halliday, grammar exists within a clause, and semantic content can only be found between clauses, which means that clause is the best interface of grammar and semantic content. In Chinese discourse, clause corresponds to the simple sentence. Therefore, simple sentence, or equivalent structure functioning as simple sentence in discourse, is chosen as elementary discourse unit in this dissertation.By describing syntactic formation and semantic content of simple sentence, this dissertation attempts to construct theoretical framework for Chinese discourse analysis.As far as syntactic formation is concerned, Zhan's (2000) conclusion is adopted in this dissertation that there are three types of simple sentences, namely subject + predicate, adverbial phrase + subject + predicate, and subject + predicate + subject + predicate. Of the three patterns, the first is used most frequently, which can be further divided into four kinds capable of explaining almost all simple sentences and equivalent structures in natural discourse.When it comes to the description of semantic content, this dissertation prefers Hierarchical Network of Concepts (HNC) to the traditional semantic framework, because the former is more powerful to explain the semantic content of simple sentences. However, HNC is not comprehensive enough to deal with hypotactic and rhetorical relations between simple sentences, so Rhetorical Structure Theory (RST) is adopted to complement HNC. From the description of syntactic and semantic content emerges Discourse Concept Structure Theory (DCST). In the framework of Discourse Concept Structure Theory, this dissertation lays down a series of norms for the annotation of Chinese discourse structure, including the following:·An open-ending set of tags. This set reveals both the syntactic and semantic description of an elementary discourse unit, and the semantic relations between discourse segments.·Norms for segmentation. Simple sentence or structure functioning as simple sentence is selected as elementary discourse unit, with comma as segmenting signal. Besides, there are a series of detailed rules to deal with special cases of various punctuation marks.·Norms for annotation. The annotation norms in this dissertation cover 3 aspects, that is, the order of the tags in the actual annotation, general rules for the annotation, and detailed rules for the identification of relations between corresponding discourse segments. Following the practice of the annotation of English discourse, this dissertation chooses bracket as the dividing signal instead of slant line. As to the order of tags of the annotation, first comes the scope of the discourse to be annotated, next comes the description of the syntactic and semantic content of the discourse unit, followed by the indication of various relations of corresponding discourse segments, with the specific content given at last. Both the general and detailed rules are made to help identify relations between corresponding discourse segments.·Rules for drawing discourse trees. By introducing the methods of drawing English discourse trees, this dissertation attempts to establish general rules and methods for drawing Chinese discourse trees.Since the ultimate goal to construct a theoretical framework and lay down various norms for the annotation is to build a reference corpus with a certain number of discourses annotated, it is necessary to develop automatic discourse processing tools. However, limited by several factors, especially relevant linguistics research, this dissertation insists on getting the most out of the existing resources by adjusting RST Tool instead of developing a completely new tool for Chinese discourse.With the help of reference corpus, many systems can be built, such as automatic document parsers, machine translation systems, question-answer systems and composition marking systems. This dissertation introduces a discourse-based document compression system so as to give researchers at home some implications to develop more language application systems to meet the needs of information processing. |