| The development of data mining plays an important role in the theory of computer algorithms. Particularly since the new century began, data mining has played a role on database and data warehouse. The great successes of the search engines make it become part of an important branch of computer research. Development of decision tree classification has reflected this point of view.The CLS method is the first of decision tree classification algorithm. ID3 algorithm appeared next, and the C4.5 algorithm, an improved ID3 algorithm, CART algorithm, SLIQ algorithm and SPRINT algorithm and so on were proposed. The emergence and even the improvements of these algorithm theories enrich the decision tree method. Text classification is a very important task in Web data mining. The processes of text classification have four important steps: text representation, feature extraction, classifier construction and rule extraction. Feature extraction and classifier construction have large computation. What method selected and used to feature extraction and what method to construct classifier play a significant impact on the entire classification process.In this paper, firstly, several classical decision tree classification algorithms are researched and analyzed. Differences among these algorithms are given after comparing them. Secondly, improve on the C4.5 algorithm, use the McLaughlin to replace the formula, and gain simplified formula for the rate of information, get a new formula for new algorithm last, this formula not only greatly simplifies the complexity of original formula, but also does not cause deviation.C4.5 algorithm is implemented on the premise that assumption is no association between attributes, independent from each other. However, this assumption may not be true practically on situations, so the property-related concepts and user interest degree was introduced, and the impact of two algorithm was analyzed. One of advantages of C4.5 algorithm is that it can deal with continuous attributes; an improved method based on the original proposed was given in the paper, and the greatly of time on memory and computing was reduced when handling of continuous attributes, the computational efficiency was improved.Improved C4.5 decision tree algorithm used on Web text classification makes the application of decision tree classification algorithm promote greater. Shortcoming ofχ~2 statistics methods on feature extraction was analyzed, negative and positive of the contributions to the class segmentation were not reflected. In this paper, it was improved based on the original, the contribution to word segmentation is more clearly, the improved decision tree classification was used on contract classification and the rule extraction was realized finally. The algorithm is simply applied to a county development zone in the information collection of OA system, experimental data show that the workload of editing information was reduced. |