Font Size: a A A

A Study On The Construction Of A Large-scale And General Corpus For Chinese Dictionary Compiling

Posted on:2016-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:M TangFull Text:PDF
GTID:2285330467981902Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Learning from the successful experience of foreign oriented lexicographical corpus,this article aims to work out a large-scale universal corpus construction plan on thebasis of the actual conditions of our country’s language research and lexicography andby selecting the right ways and paths for our country’s corpus construction. Thecontent of the thesis is divided into six chapters and the following are the maincontents of each chapter:Chapter One: Introduction. It mainly introduced research status of the domestic andforeign corpus construction, and clarifies the significance and value, the researchmethods and research ideas for building the corpus. The foreign countries has begunthe corpus construction researching earlier so they have already got more abundanttheoretical achievements and complete and scientific system structure; while Chinastarted the research comparatively late, it still largely confined to the specializedcorpus which serves only for the particular dictionary’s compilation. Therefore China’sdictionary field needs a large-scale and universal corpus for Chinese dictionarycompilation urgently. It will help to fill Chinese dictionaries theory, promote thedevelopment of Chinese information processing and improve the objectivity, accuracyand scientificness of dictionary compilation.Chapter Two: The design concept of the large-scale and universal corpus forChinese dictionary compilation. In mid and late20th century when Chomsky’srationalism is prevailing, corpus research method is popular. As the WordNet, hownet,FrameNet successfully established, the design concept of large general-purpose corpusemerged. The design concept is massive and multi-style, deep processing and monitorcorpus.Chapter Three: Corpus collection of the large-scale and universal corpus forChinese dictionary compilation. Corpus collection is an important step in corpusconstruction. Firstly, we reviewed the successful experiences in the Corpus collectionstep of the COBUILD corpus, Longman corpus, the British national corpus,Cambridge international corpus and Sinica these five corpuses, combined with thereality of our country, then we clarify that the corpus collection of the large general corpus of Chinese dictionary compilation divided into five styles: oral, novels, news,magazines and journals, the proportion of each genre are20%. Each has about70million words, and the large general corpus will include around350million words.Chapter Four: The processing of the large-scale and universal corpus for Chinesedictionary compilation. We use the XML format to collect the text into the corpus, andto mark the classification, source, author, publication date, title, and body content ofeach article. The word segmentation system will adopt multi-machine processingsystem of Chinese language corpus made by institute of computational linguistics inPeking University. Word segmentation and part-of-speech tagging will meet thecriterion of2003-edition word segmentation and part-of-speech tagging standard madeby Peking University,(Yu Shiwen, etc.,2003); The maximum matching method isadopted in the lexical annotation; Syntactic tagging will adopt dependency parsing’stagging system specification proposed by Qiu Likun (2012); Semantic annotations willuse the semantics-chapter theory founded by Mel ’ uk.Chapter Five: The function of the large-scale and universal corpus for Chinesedictionary compilation. Corpus management functions, search function, statisticsfunction, update function for new words and new interpretation, aid-to-interpretfunction.Chapter Six: Conclusion. To summarize the research work of this paper, and topoint out the direction of further research.
Keywords/Search Tags:Chinese dictionary compilation, large-scale, general, corpus, conceiving
PDF Full Text Request
Related items