Font Size: a A A

Study Of Text Feature Selection And Weighting Based On Ontology

Posted on:2011-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:G H ChenFull Text:PDF
GTID:2189360308454213Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Traditional text feature selection and weighting methods, which have shortcomings in the data sparse, are mostly based on statistical theory or machine learning methods. Therefore the text classification precision is often unable to satisfy the information need of users. At the same time, a large number of studies indicate that semantic associations often exists among feature items when traditional text feature selection method is used to obtain feature items which will make up of feature vector. Ontology is a formal and clear description of objective concepts and relations; it has a good concept of hierarchy and support logical reasoning. By the introduction of ontology, the text feature selection can be promoted from the word level to the concept level,obtaining the meaning information between the concepts.Firstly, stop words and stem extraction methods are removed to preprocess text; also vector space model is used to obtain the initial feature vector. Then the ontology shall be introduced, feature items will be mapped to the concept. Ontology-based text feature selection method consists of three areas: building a tree of the concept, mapping feature to the concept and calculating the initial weights. Because Protégéhas a good visual function, in the process of ontology, construction can automatically generate a text concept tree about concept relationship. The text concept tree showed a clear hierarchical relationship between the terms. Depending on the different terminology relationship, the mapping process will occur one to one, many to one and many to many circumstances, for these three cases, this paper adopt the maximum matching method to map term to the concept. TF ? IDF feature weighting method is the most widely used method, so using this method to calculate the initial weight of the feature. However, this method ignores the semantic relationship between terms. So TF ? IDF must be improved according to the mapping situation of feature to the concept. Finally, OWL and Protégéare used in this paper to build a small ontology model on education technology field, and then the model is used in the experiment. Experimental results show that the proposed method can effectively improve the text classification with accuracy and reduce the dimension of feature vectors.
Keywords/Search Tags:Text Feature Selection, Ontology, Mapping, Concept, Feature Weighting
PDF Full Text Request
Related items