Entity Recognition And Part - Of - Speech Tagging Of Ancient Chinese Chronology

Posted on:2013-12-30

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhu

Full Text:PDF

GTID:2175330464461398

Subject:Bioinformatics

Abstract/Summary:

PDF Full Text Request

Natural language processing is one of the most important fields of artificial intelligence, it can helps people extracting information from huge linguistic datasets, and also can understand sentences and make appropriate response to human languages. In recent 10 years, researchers made huge progress in processing Chinese, they developed outstanding methods to do word segmentation, entities extraction and even parsing. Classical Chinese, the traditional style of written Chinese, is also needed to be handled by artificial intelligence. There are large amounts of historical information needed to be extracted automatically from Classical Chinese documents. This thesis used part of the Eloquent appraisals appended to the Ming, making an example of classical Chinese corpus for part of speech tagging experiment, and try to recognize named-entity and label the POS tagging using this corpus with both statistical methods and rule-based methods. First we use the conditional random field model which has the highest performance in Chinese language processing as the sequence labeling model, design several tagging styles according to the characteristics of the Classical Chinese, and finish the tagging experiment using different graph models. Most of the POSs have a nice performance in the experiments, we find POS tagging system helps the recognition of person names. We also find both the precision and recall rate of unknown person names are much higher than the other unknown words, which infers strong rules exists around person names in the corpus. First we use the conditional random field model which has the highest performance in Chinese language processing as the sequence labeling model, design several tagging styles according to the characteristics of the Classical Chinese, and finish the tagging experiment using different graph models. Most of the POSs have a nice performance in the experiment, we find POS tagging system helps the recognition of person names. We also find both the precision and recall rate of unknown person names are much higher than the other unknown words, which infers strong rules exists around person names in the corpus. Then we tried to recognize person names in the same material using rule-based method. Through observing, we find the relations between person names and government posts in annuals. Using the rules extract from sentences that contains both government posts and person names, we recognize the person names without using corpus and gain a nice result. In the end, we also analyzed the shortcomings of this method. In brief, POS tagging experiment indicates that the processing of Classical Chinese does not need the word segmentation, and the quantity and quality of corpus and dictionary is still important. In annuals, we can use rules to extract most of the person names. There is still a lot of hard work to do when processing other styles of Classical Chinese.

Keywords/Search Tags:

Annalistic Style, Conditional Random Field, POS Tagging, Person Name Recognition

PDF Full Text Request

Related items

1	Research On Automatic Word Segmentation Of Zuo Zhuan Based On Conditional Random Field
2	Research On Historiography Of Annalistic Style In Han Dynasty
3	Research And Implementation Of Teaching Chinese As Foreign Language System Based On Chatbot
4	The Influence Of Field Cognitive Style On Facial Expression Recognition
5	A Study Of The Acquisition Of English If-conditional Sentences By Chinese Learners
6	Application Research Of Bi-LSTM-CRF Model In Chinese Grammar Error Diagnosis
7	Emotional Classification Of Movie Criticism Based On Semantic Features
8	Research On The Named Entity Recognition And Base Noun Phrase Identification
9	Experimental Study On The Fusion Of Dictionary Segmentation And Model Word Segmentation In Chinese
10	Study On The Social Orientation Of Field-dependent And Field-independent Individual In Chinese Word Recognition Task