Font Size: a A A

The Establishment Of Tang Poetry Corpus Used In The Analysis Of Classical Chinese Poetry

Posted on:2017-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:H YuanFull Text:PDF
GTID:2335330503464596Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of natural language processing technology, more and more attention has been focused on the use of computers to deal with the ancient Chinese literary classics. The related research mostly depends on machine learning based corpus data.Most corpora is based on modern Chinese because of the actual demands. And there are still some shortages in the corpus with labelled ancient literature. Consequently, it is necessary to build a proprietary corpus that can assist the ancient literature research.In this paper, "the Full Collection of Tang Poe" is as a label object. And a corpus system of word segmentation and part-of speech tag is constructed with natural language processing technology. The system contains the characteristics of Tang poetry including syntax and rules. It also achieve the function of manual correction. Firstly,the article analyzes the unique nature of the Tang Poetry. And some related knowledge base(KB) have been set up. A list of tang poetry words has been built in use of parameters like word frequency, mutual information and co-occurrence degree. The Tang poetry can be labeled with Hidden Markov model(HMM). After that, the construction process of the Tang poetry corpus system, which is established in the study of ancient poetry, is discussed in detail. Finally, the experimental results are analyzed and summarized.
Keywords/Search Tags:Tang poem corpus, Statistical word exctraction, Manual-checking, Tang poems tagging
PDF Full Text Request
Related items