Font Size: a A A

The Research Of Text Preprocessing Based On Web Mining And Itsapplication

Posted on:2007-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:P R ZhongFull Text:PDF
GTID:2178360212959313Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet technology and its widely-spread applications, there comes a situation of large quantity of information lacking knowledge. How to obtain useful information from large amount of texts is a very important task in information processing. Web text information mining is one of the important applications of applying data mining technologies into information analysis and processing. However, because of the Internet's openness and heterogeneity, it is very difficult for users to get the needed information quickly and exactly. Hence, how to get the exact information quickly becomes a significant research task. Text preprocessing is the bottleneck of text classification. Therefore, the study of text mining and the application of it in text classification in this thesis is of great importance in both theory and practice.This thesis first describes the relative theory in Web mining; then, it focuses on the discussion of the pre-processing technology Web text-mining. Finally, it illustrates the improved method of text preprocessing.(1)Feature weight computation:by analyzing and comparing the often used feature weight algorithm, this thesis makes improvements based on the traditional TF×IDF algorithm, i.e., it uses the new Gini index evaluating function to replace the IDF. The experiment result shows the improved method has advantages in comparing with the other algorithms in the classifying precision and the mining efficiency.(2) Classifying: It introduecs new weight functions into the Bayes Classifier and the fkKNN Classifier to adjust the weights. Through using SVM,NB,KNN and FKNN Classifiers, the results of experiments show that the method has improved them by considering classifying features.(3)Design the algorithms of the email mining module of an email mining system...
Keywords/Search Tags:Web mining, text mining, VSM model, Feature selection, text classification
PDF Full Text Request
Related items