Font Size: a A A

The Research Of Corpus Annotation From The Perspective Of Computational Linguistics

Posted on:2013-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:G D WangFull Text:PDF
GTID:2235330392958160Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
The emergence of Corpus and the corpus linguistics is an epoch-makingsignificance in Linguistics.Then with a rapid development, capacity expanding,capabilities growing, its research and THE application’s fields are also enlarging. In theprocess, corpus annotation plays a significant role. Corpora annotation is an importantpart of the corpus and has become a hot pot of corpus research. It can reveal the depthinformation of language and expand the function of corpus. So it’s the fundation ofcomputational linguistics research. But there is no systematic study on it. Past research oftagging corpus focused on building a practical dimensioning system, isolated research ofa particular annotation, scattered in a large corpus of technical specifications,in lack ofthinking and exploring.This paper try to discusses the concept and meaning of corpus annotation,and theprinciple of corpus annotation from the perspective of computational linguistics,focusingon structural annotation and semantic annotation of these two types ofannotation,Highlight a structural annotation model and a semantic annotationmodel.The Introduction summarizes the domestic research of corpus annotation,Illustrates the research focus,﹑research methods, Specify the focus of the article.In thesecond chapter we derive the concept of corpus annotation form the concept of corps,Explain the significance of corpus annotation in two ways。we explain the principles of Corpora annotation proposed by leech In order to In order to satisfy the new modelcorpus annotation requirements,we add four principles of corpus annotation:①designfunctional annotation system to meet corpus main purpose②pay attention to thecompatibility of corpus annotation in different level③attach importance toendorsement of software related to corpus④design corpus annotation which is easy toshare。chapter III describes the old and the new Corpora annotation mode,intrduces someconcepts about the TEI annotation mode and then present the standard generalizedmarkup language which has a close link to the TEI mode.we make a summary of severalannotation mode。In chapter IV,we analyze syntax annotation, give prominence tostructural annotation which is a kind of important annotation type. summarize the twomain structural annotation type, propose the simplest syntactic structure annotationmodel。The mdel sets the direct component analysis as a theoretical basis,describs thesyntax structure of the sentence by a simple symbol system. There is some referencevalue to the chinese structural annotation. Chapter V sets semantic annotation as focalpoint,proposes a model of Manual semantic annotation on the basis of previous studies.The semantic tagset of the model refers to case grammar.The semantic annation modecombines parts of speech annation and structure annotation. which easy to implementin the machine and The semantic model content a lot of linguistic information。 In thechapter VI, we generalize the characteristics of Chinese Corpus annotation from syntax annotation and semantic annotation’s perspective we review paper and point out thedetail that need further improvement at the last chapter.
Keywords/Search Tags:Corpus annotation, Computational Linguistics, Semantic annotationThe most simple annotation model, The principle of Corpus annotation
PDF Full Text Request
Related items