Font Size: a A A

Research About Text Classification Model Of Dual Characteristic Based On Concept And Etyma

Posted on:2016-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:T J WuFull Text:PDF
GTID:2308330479984860Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, network information are increasing at an alarming rate. In an "information explosion" era, the demand of getting information we need quickly and accurately makes text classification become a very important topic. At the same time, text classification technique is also the basis of information retrieval and natural language processing technology. which makes more and more experts devote to the research of text classification, and has a very broad application prospects.At present, It is very deep about using semi supervised learning in text classification research, and co-training is a typical kind of semi supervised learning, which has been used widely. But common co-training method is applied in text classification, when constructing double views, only according to morphological(root), which ignore the importance of semantic(concept) features for classification. In this paper, on the basis of co-training, integrating into semantic influence in classification, a text classification algorithm based on concept and mutual double characteristic was proposed, which can improve the effect of the classification model.This paper briefly introduced the research background and related technology of text classification, and introduced the co-training framework and the Word Net ontology library in detail, which are the two bases required in the new proposed algorithm. Under the framework of co-training, we propose a dual characteristic of mutual text classification algorithm based on concept and root combined Word Net ontology. Compared with other classification algorithm about co-training, this paper build dual views from two aspect, concepts and roots. not just the roots extracted from content of the text, which considers the influence of semantic in classification. But methods based on concept is different from methods based on root. The connection between the root can be ignored, but the semantic in Word Net ontology can’t. So we introduce the semantic similarity computation in text classification of concept, and applied it to the calculation formula about concept classification. Last, the process of text categorization algorithm description about concept and root is given.At the end of this paper, two groups of experiments are given in order to verify the influences of different R parameter(Distances between nodes in ontology library) on classification models and the effects. The experimental results show that the new algorithm based on concept and mutual double characteristic of etyma has higher accuracy and recall rate, the choice of R parameter makes difference on classification effect, and with bigger R parameters, the worse the effect is.
Keywords/Search Tags:Text Classification, Concept Characteristic, Dual Characteristic of Cooperation, Co-Training Frame, WordNet Ontology
PDF Full Text Request
Related items