Font Size: a A A

A Corpus-supported Approach To Systemic Functional Grammar:Automatic Annotation And Concordance Of Ideational And Textual Metafunctions

Posted on:2013-01-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:J P LiuFull Text:PDF
GTID:1115330374471347Subject:English Language and Literature
Abstract/Summary:PDF Full Text Request
As a new perspective for linguistics research, the central issue of a corpus-supported approach to systemic functional grammar (SFG henceforth) is to realize the semi-automatic/automatic retrieval and annotation. In this approach, the theoretical research and exploration are based on the concordance, annotation and statistical analysis, through which SFG becomes theoretically attestable and thus comparatively objective. In addition, support from a large-scaled corpus makes the approach attestable, and annotated data is theory-driven. The realization of the semi-automatic and automatic annotation, concordance of the partial theories of SFG makes it possible to annotate linguistic data on a large scale, through which SFG can be researched in a more systemic and penetrative way.A corpus-supported approach to SFG depends on an exploration into the theoretical compatibilities and complementarities between SFG and corpus linguistics (CL henceforth). The disciplinary character of CL belongs to the school of empiricism on the philosophical ground; therefore, CL agrees with SFG in essence. Both branches of linguistics originate from J.R. Firth. In both cases, language is regarded as a social phenomenon, and has been studied on its use in the naturally-occurring texts. In addition, probability of language is admitted by both, and a corpus incorporates forms, meaning, function and context into a whole in a quantitative way. In the corpus-supported approach to SFG, the probabilistic profile of the linguistic system and phenomena is established through the retrieval and analysis of linguistic data.Domestic researches into SFG and CL are conducted separately rather than jointly. SFG study depends mainly on inducing and reasoning theoretically or on the linguistic data of one or several texts. Therefore, it is hardly eligible to be called corpus-related. In the research abroad, a large amount of papers and works are on the relatedness of SFG to corpus and CL, in which both corpus-based and corpus-driven approaches to SFG are proposed. However, neither properly solves the incorporation of corpus and CL into the research on SFG. A corpus-based approach to SFG is too theory-heavy to automate the annotation, and most of the linguistic data are annotated manually and selected casually for sampling theories with neither theoretical attestation nor theory-driven exploration. The attestation by linguistic data is neglected and the research in that way is thus short of comprehensiveness, objectivity and representation. While in a corpus-driven approach, linguists tend not to acknowledge the previous-existing theories in SFG, inducing the linguistic data and basing their observation simply on the linguistic intuition and limited theories. Therefore the research in a corpus-driven one is neither deep nor systemic.A corpus-supported approach deals with the central question of both corpus-based and corpus-driven approaches, namely, the automatic annotation (or semi-automatic annotation) and concordance at the clausal level, only through which can the research in SFG be corpus-attestable, objective, systemic and penetrative. The key to the automation of the syntactical annotation and concordance is a possible automatic process of annotating and retrieving items. It is impossible both in theory and practice to try to annotate clauses at the syntactical level. The automation of the partial syntactical annotation and concordance can be reached through the lexical preference applied in the automatic process of linguistic data. The core of the proposed automation is to establish a model, which entails two aspects:to try to model the syntactical meanings in metafunctions in SFG into some automatic annotatable and retrievable modes; to model the lexical items into the mode that is expected to ensure both an automatic process and more syntactical meanings. In brief, theories are expected to become concrete to come down to linguistic data, and linguistic data is required to be abstracted into some syntactically more informative modes. The character of the modeling makes it impossible to process all the meanings in metafunctions by the automatic annotation and concordance due to the fact that some meanings are too abstract to be modeled into some automatically-processed modes, that is to say, it is hard for them to come down to the naturally-occurring linguistic data.In operation, the following measures are given to illustrate the realization of the automatic or semi-automatic process of linguistic data with the partial theories in SFG marked. Annotation, concordance and process at the clausal level from the perspective of lexical preference can be divided into three sections:the first is to set up the probabilistic profile of the function of the co-occurring constituents of clauses within the framework of SFG; secondly, the semi-automatic and automatic annotation and concordance at the clausal level are explored under the partial framework of ideational and interpersonal metafunctions; the third is the theoretical modeling and concordance of the cohesive system of the textual metafunction. Syntactical annotation and concordance from the perspective of lexical preference starts from a research on the co-occurrences of the constituents in clauses. The linear co-occurrences of the constituents of clauses are the grammar of the lexical paradigmatic systemic choices in construing figures because the lexical research is paradigmatic and co-occurring in the lexico-grammatical choice system. Firstly, the fundamental method is to study the features of the vertical choice probability of the systemic members and their functions through the corpus concordance. The linear co-occurring study is to embody the function of nodes in order to establish the probabilistic profile of lexical choices, in which, lexical use can be quantified in a probabilistic profile of the use functions. Therefore, the lexical linear co-occurring research is the basis and the prerequisite for construing the probabilistic profile of syntax and crossing-syntax texts.The syntactical annotation in the corpus-supported approach starts with the systemic choices of probability that construe experience into figures. The syntactical annotation is dealt with from the perspective of lexical preference, and its realization resorts to three steps. It is the first step to set the concrete content of a certain level of research, such as one of the three metafunctions; the second step is to specify the research question and formalize the study question into the lexical and regex retrieval items, through which raw data is to be retrieved with manual check; finally, the manually-sifted retrieved data is to be annotated collectively in an automatic way. The combination of the automatic annotation, concordance and manual sifting overcomes the labor-&-time-intensive disadvantages. The concrete steps are taken as follows:1) tag all text or corpus with certain software;2) retrieve and make a list of words of predicate verbs according to certain aspect of research with set software;3) put in the texts or corpora into the software designed for SFG;4) readjust or design an annotating scheme according to the research and the framework of SFG;5) load in the scheme;6) retrieve first by regex edited for the required data in the research and sift the concordance lines manually to tick off and delete all those that do not agree to the regex in meaning level, and if necessary, add some more changes to the annotating scheme, thesaurus, or regex considering some exceptions are valuable;7) annotate all the checked concordance nodes automatically with the concepts in the annotating scheme. The theoretical modeling in the annotation of ideational function of SFG is, in essence, that the theories are lexicalized into retrievable forms. Not all grammatical meanings in ideational function are modeled because a full modeling biases research results by confining the research completely under a pre-designed framework. In addition, complete modeling makes the annotation increasingly manual, and corpus is thus more small-scaled; research, less objective.The model for the cohesion system research in the corpus-supported approach to SFG is distinctive from the research in ideational function. Modeling the cohesion system of textual metafunction is to transform the cohesion system into a lexical string or regex that can be automatically retrieved. To formalize the cohesive system is to set up the thesauri of different cohesive meanings. The thesauri in synonymy&antonymy, or hyponymy&meronymy for different research purposes are to be made in a list retrievable. The procedure is as follows:1) set the research topic within cohesion system;2) set up the thesaurus of a cohesive feature of like hyponymy by listing a thesaurus of all the hyponyms of a super-ordinate or other relationships according to the set research;3) allot each feature a concordance list, and if necessary, some contextual word list(s) are expected to be given;4) observe the probability profile established from the statistic results of the comparative relative frequency of linguistic features within or across texts.
Keywords/Search Tags:Systemic Functional Grammar, corpus-supported, lexical preference, semi-automatic annotation
PDF Full Text Request
Related items