Font Size: a A A

Entity Recognition And Part - Of - Speech Tagging Of Ancient Chinese Chronology

Posted on:2013-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhuFull Text:PDF
GTID:2175330464461398Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Natural language processing is one of the most important fields of artificial intelligence, it can helps people extracting information from huge linguistic datasets, and also can understand sentences and make appropriate response to human languages. In recent 10 years, researchers made huge progress in processing Chinese, they developed outstanding methods to do word segmentation, entities extraction and even parsing. Classical Chinese, the traditional style of written Chinese, is also needed to be handled by artificial intelligence. There are large amounts of historical information needed to be extracted automatically from Classical Chinese documents. This thesis used part of the Eloquent appraisals appended to the Ming, making an example of classical Chinese corpus for part of speech tagging experiment, and try to recognize named-entity and label the POS tagging using this corpus with both statistical methods and rule-based methods. First we use the conditional random field model which has the highest performance in Chinese language processing as the sequence labeling model, design several tagging styles according to the characteristics of the Classical Chinese, and finish the tagging experiment using different graph models. Most of the POSs have a nice performance in the experiments, we find POS tagging system helps the recognition of person names. We also find both the precision and recall rate of unknown person names are much higher than the other unknown words, which infers strong rules exists around person names in the corpus. First we use the conditional random field model which has the highest performance in Chinese language processing as the sequence labeling model, design several tagging styles according to the characteristics of the Classical Chinese, and finish the tagging experiment using different graph models. Most of the POSs have a nice performance in the experiment, we find POS tagging system helps the recognition of person names. We also find both the precision and recall rate of unknown person names are much higher than the other unknown words, which infers strong rules exists around person names in the corpus. Then we tried to recognize person names in the same material using rule-based method. Through observing, we find the relations between person names and government posts in annuals. Using the rules extract from sentences that contains both government posts and person names, we recognize the person names without using corpus and gain a nice result. In the end, we also analyzed the shortcomings of this method. In brief, POS tagging experiment indicates that the processing of Classical Chinese does not need the word segmentation, and the quantity and quality of corpus and dictionary is still important. In annuals, we can use rules to extract most of the person names. There is still a lot of hard work to do when processing other styles of Classical Chinese.
Keywords/Search Tags:Annalistic Style, Conditional Random Field, POS Tagging, Person Name Recognition
PDF Full Text Request
Related items