Font Size: a A A

A Corpus-based Study On English Spoken Formulaic Sequences

Posted on:2017-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:M WangFull Text:PDF
GTID:2295330488982619Subject:English Language and Literature
Abstract/Summary:PDF Full Text Request
Formulaic language is conventionalized expressions in situated contexts. It is frequent recurrent multi-word sequences. As a universal linguistic phenomenon, formulaic language is frequently used in spoken discourse(Altenberg, 1998; Biber et al., 2000; Erman & Warren, 2000) and provides processing advantages for native speakers(Conklin & Schmitt, 2008; Underwood et al., 2004). Formulaic language is entrenched language structures in the mental lexicon of native speakers due to frequent use. Language structure emerges from language use. Children language acquisition begins with actually used language items(Tomasello, 2003). Second language acquisition is based on received input(Ellis & Wulff, 2014). Learners generalize rules from first and second language with human general cognitive mechanism. In the past few years, several researchers(Martinez & Schmitt, 2012; Paul & Nation, 2008; Simpson-Vlach & Ellis, 2010) have attempted to derive frequent formulaic sequences from corpora to serve pedagogical purposes and used a combined method of automatic retrieval and manual selection. However, these studies are still at the preliminary stage and rarely focus on spoken formulaic sequences. Most derived formulaic sequences are composed of two or three words, but they are difficult to use by language learners due to the lack of paired meanings and situated contexts. Original TV series provide situation-based authentic language input and contribute to vocabulary acquisition(Webb, 2015). Lin(2014) testified the validity of using internet television as a resource to learn spoken formulaic sequences and pointed out that factual, drama and comedy genres are the most similar to everyday speech. However, this study has only indicated which genres to prioritize. None specific item is provided.This study presupposes that formulaic sequences in TV series represent those in everyday speech and four-word lexical bundle is the major means that unifies individual words and grammar in English spoken formulaic sequences. In order to approximate spontaneous everyday speech to a large extent, American life-oriented sitcoms are chosen as data to construct the sitcom corpus, the total running words of which are 1,748,638. AntConc 3.4.4 is used to retrieve four-word lexical bundles from the sitcom corpus, which are categorized according to their structural features. Lexical features of frequently used categories of lexical bundles are summarized. On the basis of four-word lexical bundles, formulaic multi-word expressions are tracked down. Besides, the connection between frequent word types and multi-word expressions is explored. A multi-word expression is either a form-meaning pairing, which is segmented by punctuation marks on both sides or listed as a whole in the dictionary, or a formulaic sentence frame, which is composed of an invariable multi-word core and a variable slot that has to be filled in by a clause. Specifically, this study is going to answer the following three questions:(1) How many four-word lexical bundles are there in the sitcom corpus and what are they? Which categories of lexical bundles are frequently used? What are the lexical features of bundles in these categories?(2) What form-meaning pairings can be found directly from these lexical bundles? What form-meaning pairings contain these lexical bundles? What formulaic sentence frames can be generalized from these lexical bundles?(3) What are the most frequent word types in the sitcom corpus? What is the usage pattern of frequent word types in multi-word expressions?The research findings are presented as follows:(1) There are 779 four-word lexical bundles in the sitcom corpus, the distribution pattern of which is comparable to that discovered in the spoken sub-corpus of the LSWE Corpus. The most frequently used four categories of lexical bundles are personal pronoun + lexical verb phrase, verb phrase with active verb, yes-no question fragments and wh-question fragments. 282 lexical bundles that center on personal pronouns only involve first and second person pronouns. Coordinators and or but are sometimes used to connect clauses. Among the 141 verb phrase bundles, get/got, have/had/having, going, various forms of be and negative contraction of do are relatively active. Both be going to and have to appear in a bunch of lexical bundles, in which a few instances entail their combination be going to have to. Among the 130 question fragments, 71 are wh-question fragments and 59 are yes-no question fragments. 42 wh-question fragments begin with what and instances beginning with where, who and when are rare. Yes-no question fragments that begin with am or are are entailed in wh-questions. 23 yes-no question fragments begin with do, the co-occurring verbs of which are think, know, want and have.(2) 105 form-meaning pairings are found, in which 75 are composed of four words and 30 are composed of more than four words. 91 form-meaning pairings are well-formed sentences, most of which are structurally regular and semantically transparent. Some similar expressions only vary in personal pronouns, but they display distinctive semantic prosody when used as clauses in complicated sentences. Some similar expressions only vary in tense, but their pragmatic functions differ distinctly. Among the 57 formulaic sentence frames, 11 are questions and 46 are declarations. A few invariable cores are form-meaning pairings themselves, such as what do you say. I don’t know is the most active core, which can be followed by various relative words to elicit clauses. Frequent relative words include what, how, if, where, why and who.(3) 32 word types in the sitcom corpus occur more than 10,000 times, among which 28 are function words. One lexical verb is know and one interjection is oh. Have and do can be used as either lexical verbs or function words. Among the 28 function words, eight are pronouns, which are I, you, we, me, this, that, it and what. I and you occur extremely frequent and they co-occur with know in a large number of multi-word expressions. I usually appears at the beginning of a declaration while you appears either in the middle of a question or at the beginning of a declaration. Besides, you know and you know what are two formulaic expressions. Oh either stands alone or is combined with various elements to form formulaic expressions and convey different emotions. Apart from and and but, so is also used frequently at the beginning of a sentence to function as a discourse marker. Formulaic expressions contribute to learning the usage pattern of function words.This study has explored the issue of using the frequency-driven approach to derive formulaic expressions from a given corpus. It has promoted corpus-based studies on formulaic sequences. To some extent, the findings have testified the validity of using life-oriented sitcoms to learn spoken formulaic sequences. Form-meaning parings and formulaic sentence patterns derived from this study are perceptually salient to foreign language learners. They can be memorized and used as a whole. When teaching and leaning spoken English in the future, teachers and learners should pay more attention to formulaic expressions containing function words as well as the usage pattern of inserts. Besides, the holistic learning strategy should be applied to establish the association between the form of a formulaic expression and its paired meaning.
Keywords/Search Tags:formulaic language, formulaic sequences, four-word lexical bundles, multi-word expressions, function words
PDF Full Text Request
Related items