Research On Text Classification Of Web Data Mining

Posted on:2008-07-01

Degree:Master

Type:Thesis

Country:China

Candidate:S P Zheng

Full Text:PDF

GTID:2178360272477181

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of web technology, all kinds of information on the internet is increasing rapidly with the explosive growth, so how to find the useful pattern and interesting information efficiently from the data set which contains huge information, has become an urgent problem to be solved, and web data mining technology appears under this background.This paper analyses the web data mining technology in the text classification technology, and puts emphasis upon Chinese text classification, including feature weighting, feature selection based on identifying community, and the measure of similarity based on graph space model. The main contents done in this paper are as follows. First, by analyzing the basic theory of Gini index and feature weighting, we present a novel weighting formula based on Gini index. The results show the novel weighting formula improves the performance of text classification. Second, by studying the conception of community structure in complex network, we propose a new algorithm of feature selection based on identifying community, and it can overcome the weakness of the traditional feature selection methods omitting the semantic context. The experimental results also show that the presented algorithm is efficient. Third, by studying the graph space model, we propose an improved standard of text similarity measure based on the analysis of structure equivalence, and it overcomes the defect of the text structural information which can not be expressed efficiently with vector space model. Results show the new standard is efficient and feasible in the field of text classification.

Keywords/Search Tags:

Web Data Mining, Text Classification, Gini Index, Complex Network, Identify Community, Vector Space Model, Graph Space Model

PDF Full Text Request

Related items

1	Data Mining Technology Research Based On Vector Space Model
2	Study Of Text Classification Model Based On Key Vector
3	Research On Data Mining Technologies Applied To Web Chinese Text
4	The Research And Implement Of Automatic Text Classification System Which Is Based On Vector Space Model
5	Research And Improvement Of Automatic Text Classification Algorithm Based On The Vector Space Model
6	Research Of Text Categorization Base On Vector Space Model And Association Rules
7	Research And Implementation Of Text Classification System Based On VSM
8	Application Of Text Categorization Algorithm In Practical Modeling
9	Research Of Text Categorization Based On Vector Space Model
10	The Study And Application Of Web Data Mining Using In Information Monitor System