Font Size: a A A

A DCC-based Study On The Automatic Extraction Of Interpretative Information Of Popular Words

Posted on:2007-12-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:X M XieFull Text:PDF
GTID:1115360185468407Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
The popular word is the vocabulary in a certain period, rapidly propagation of a certain region or one crowd. The popular words' interpretation is to carry on the explanatory note or explanation to the meaning of the popular word. This thesis mainly study extraction method of interpretative information of the popular word among the extensive true texts. Meanwhile, we try to arrange the relevant interpretative information in an order according to certain importance degree.On the basis of Dynamic circulation Corpus, we have chosen the network edition texts of 15 kinds of major newspapers of the whole year of 2004 and the whole year of 2005, amount to 841, 547, 0 69 words (i.e. about 840 million words ), the total amount of the language material is 415, 756, 7 words in 2004, the total amount of the language material is 425,790,366 words in 2005. On this basis, we have set up the training corpus through artificial marking for the interpretative information of the popular word .Then we go on close test and open test that abstract interpretative information automatically to popular word for 2004 and 2005 separately.The characteristics of research mainly reflects in following respects:Firstly, We have investigated the interpretative information of the popular word for the first time1. We have defined the concept of interpretive information of the popular word. But the information not refined and combined the interpretative information of the popular word which related to interpretation of the popular word.2. We have clarified the classification of interpretative information of the popular word. We divided the interpretation information of the popular word into two kinds in terms of form :The interpretative information with mark of popular word.The so-called mark which means 'is' , 'it is', 'call' , 'name', 'including /include','so-called......Refer to......', 'from......that form / make up' ,etc. particular word or fixed structureused in the general word interpretation, the interpretative information of popular word with these marks calls the interpretative information of popular word with mark.The interpretative information without mark of the popular word.This kind of interpretative information without sign word as above. But we can find other special words, for example: time , place, incident ,etc, these words can be regarded as the feature item while extracting, but the result can not compare with interpretative information with mark. This kind of interpretative information can be divided into personage's resume interpretative information and incident interpretative information.Secondly, we have realized the automatically extraction to interpretative information of the popular word.
Keywords/Search Tags:NLP, DCC, interpretative information of the popular word, automatic extraction, similarity of sentence
PDF Full Text Request
Related items