Font Size: a A A

Information Processing On Mencius And Its Commentations And Annotations

Posted on:2014-09-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:S H LiangFull Text:PDF
GTID:1225330482983247Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
The Pre-Qin documents and the relevant annotations and commentations have not only ensured the spread of Confucian classics, but also promoted the development of the study of Confucian classics and the progress of civilization, being convenient to the later generations to read, research and take advantage of the Pre-Qin documents simultaneously and providing the information processing of the Pre-Qin documents with a brand new approach and utilitarian great resources.The development of the modern computer hardware and software technology and the continuous improvement of the Chinese information processing research methods and means have made the information processing of the mass of the Pre-Qin documents and their annotations and commentations, including Mencius, become possible, and have embodied the attempts and explorations completely different from the modern Chinese information processing approaches in many dimensions.Based on Mencius and its relatives like Annotations and Commentations on Mencius, Mencius variorum, Annotation on Mencius, etc. as resources, this article compares the study of the Pre-Qin documents like analects of Confucius, Master Zuo’s spring and Autumn Annals etc., which made a deep processing of Mencius, by trying to find out a new method and a new idea of automatic word segmentation, part-of-speech tagging and word sense disambiguation of Mencius on the foundation of annotations and commentations, and studied the problems of such, as well as the statistics of text and language features and the automatic identification of figures of speech of Mencius.On the basis of the analysis of the emphatical structure of Mencius and its relative annotations and commentations like Annotations and Commentations on Mencius, Mencius variorum, Annotation on Mencius, etc., this article creatively proposed iterative inspection sentence alignment algorithm based on the range and on semantic similarity score for the original and citation sentence alignment, and has tried the alignment method based on the regular expression matching comments and the value of F was as high as up to 0.09 in maximum.The article has developed the standard of the word segmentation of Mencius, segmented the words of Mencius by adopting the methods of automatic word segmentation based on regulations, statistic model and annotations and commentations respectively, and in addition to the statistic indicators of the value of F of the words, this had tried to introduce the statistic indicator of the value of F of the clauses initiatively.This article formulated the part-of-speech tagging set, and tried to correct the results of the part-of-speech tagging based on different statistic methods automatically by means of the relevant information of the annotations and commentations, such as phonetic notation, FAN QIE(a traditional method of indicating the pronunciation of a Chinese character by other two characters) and so on. It also conducted word sense disambiguation experiments of 10 high frequent polysemes of Mencius in two ways, being the KNN-based disambiguation tree and the CRFs (Conditional Random Fields), having established a human-computer interacting platform for word tagging and correcting.This article also has massively adopted some statistic measuring features to study the text features and language style of the relative Pre-Qin documents initiatively and did the same measuring analysis of the relevant annotations and commentations for the first time as well. Taking the parallel sentences of Mencius and The Analects of Confucius for instance, it is the first time to discuss the automatic identification of figures of speech of classic Chinese, and the article has devised the corresponding algorithm of automatic identification.
Keywords/Search Tags:annotations and commentaries, sentence alignment, automatic word segmentation, part-of-speech tagging, word sense disambiguation, language style, automatic identification of figures of speech
PDF Full Text Request
Related items