| With the flood of information on the Web, Web mining is a new research issue which draws great interest from many communities. Currently there are many algorithms about Web mining. Simple Bayes is a good algorithm of them. It needs many training documents to make classifier. So how to improve accuracy and decline the numbers of training documents is very important for simple Bayes-based classifier.Text mining includes words processing and text classifying method. English text is made as the classified sample in order to decline the complex of the word stemming,since Chinese text need much work on word proceeding. And an improved Document Frequency method is made the standard of the character vector. In the classifier, there are several methods, which include the method of probability and iteration. It gives some words which are regarded as the latent label before the classifying. Then, the document is labeled with the label which has the maxim post-probability in every classification. The final label is made up of the all labels that are got from iterations.This classifier is a simple Bayes-based classifier for Web text. It uses methods of simple Bayesian model and latent word analyses. It decreases the complexity of Bayesian net, and it improves the accuracy of classifier and decline the number of training number. So it is a good text classifier by the experiments. And it should be tested for Chinese text in next state study. |