Font Size: a A A

Research On Mongolian Discourse Markers Based On Film And Television Drama Corpus

Posted on:2013-01-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:M N BaoFull Text:PDF
GTID:1115330374470720Subject:Chinese Ethnic Language and Literature
Abstract/Summary:PDF Full Text Request
A functionally related class of connective expressions which signal a relationship between the interpretations of the segment they introduce commonly related as Mongolian discourse markers. They are in various forms and functional complex ways and always affect the syntactic analysis results. Thus, research on Mongolian discourse markers is significant in theory and practice. The main result of the paper is as follows:1) Definition and classification of Mongolian dinscourse markersMongolian discourse markers are a complex linguistic phenomena, its definition and classification need multiple level of analysis.This thesis summarizes the properties of Mongolian discourse markers——detachable from syntct, procedure meaning, mata-pragmatic function and classified them into14groups and gives detail explaination of every functional groups.2) Construction of Mongolian film and television drama corpusThe construction of corpus has a big influence on discourse marker's reseach value. Because it reflect the validity and reliability of study based on that. In this respect, mainly study the Mongolian film and television drama corpus related issues such as corpus design, text colection and corpus orgniazation.First of all, aspect of corpus design purposes and methods are introduced. Then, the data representativeness and balance issues such as the classification and proportion, distribution and sample selection, and circulation during the text collection are mainly discussed. At last, the storage format, data information and tools used for management are introduced.3) Annotation and analysis of Mongolian discourse markersIn an ongoing research project about annotation of pragmatic function of the Mongolian discouse markes based on the film and television drama corpus, this study aims to improve parsing accuracy as well as provide explicit information in the estabiliment of coherence in discourse. In this paper, according to the classification of Mongolian discourse markers's pragmatic functions, we describe the automatic recognition rules and develop the program. Firstly, we get the lexical information from the morphological features of discourse markers. Then, manully select the discourse markers from corpus and deposited in the basic word list. After that, we give corresponding code to every functional group and set up the formal rules-set to develop the program. As a result, in30,000words level of test set, the system gets its everage result on corpus with54.26%recall and85.58%precision.4) Limitation of present research and further studyThe experiment result shows that recall and precision is not very high and this could be due to one of several reasons:Firstly, the precision of rule-based parser for disambiguation of discourse markers is not high. Secondly, because of the contents, subject and size of corpus, the discourse markers frequency affects the recall. Thirdly, the redundant string incompatible with the language reduce the recall. At last, processing level of Mogolian film and television drama corpus seriously restrict the precision of Mongolian discourse markers.In view of the above problems, we prepared to improve the system in the follow-up work from the several aspects. First of all, expand the training set to closer to the language. Secondly, improve machine dictionary by increasing information content. Finally, add statistical model to the rule-based algorithm.In sum, this paper discusses the definition and classification of Mongolian dinscourse markers, construction of Mongolian film and television drama corpus as well as limitation of present research and further study.This dissertation consists of five chapters:Chapter one reviews the various productions of discourse markers in theory and practice, and introduces the methodology, meaning and innovations. Chapter two introduces the definition of Mongolian discourse markers, cause of their existence, conditions of their forming and classification of pragmatic functions. Chapter three discusses the corpus related issues such as corpus design, representativeness, balance and organization of structure. Chapter four is mainly about the research of automatical annotation of Mongolian discourse markers and the experiment result. The last chapter gives a brief summary and limitation of present research, and discusses the suggestion of further study.
Keywords/Search Tags:Mongolian discourse markers, mogolian film and television dramacorpus, pragmatic fuction, automatical annotation
PDF Full Text Request
Related items