| In recent years, the tree mining and pattern classification has become a quite active area of research in data mining. At the same time, because most of the data emerges with the form of a continuous stream, we need to consider the data distribution changes over time, such as sensor networks, web logs, a variety of enzyme molecule structure in biology. Finding the discriminative pattern is an important part of tree mining. Meanwhile, it is necessary to design a real-time and fast classification algorithm. Classification model should be adapted to the dynamic changes of data distribution. At present, the classification methods to structured data is based on the frequent substructure mining, and then get associated structure rules after a sort or pruning processing of frequent sub-structure with a class and then use for predict task. Now the mature classification methods of tree stream mainly include: classification algorithm XRules of cost-sensitive classification model, firstly, it founds a large quantity patterns which meet the user-defined minimum support and then select high-quality rules with user-defined confidence threshold. Another is AdaTreeMiner algorithm using boosting classification method after mining closed frequent trees and so on.In this paper, we introduce the theory of knowledge of tree mining and analyze the advantages of closed and maximal frequent pattern. Then, we detail the ideas and principles in the class correlated pattern mining. Compared to XRules, AdaTreeMiner algorithm improves the time efficiency in classification, and takes into account the drift concept. But its predict accuracy is lower. In this paper, we propose a tree stream classification algorithm TSC based on class correlated pattern. TSC give the SP-tree data structure used to the tree pattern discovery process. During this process, TSC uses branch and bound technology to improve the search efficiency without mining the complete frequent patterns, on the other hand, updates the threshold to avoid the post-prune step, and allows classifying directly using the tree patterns. Secondly, TSC optimizes the statistic chi-square measure to improve the time and space efficiency in generating k-best tree patterns. The k-best tree pattern can be used to tree stream classification directly, so TSC is simple and efficient. In addition, we introduce the one-versus-all classification method to tree stream classification to solve the problem of efficiency in the multi-class classification problem.The experiments on synthetic and real data sets show that the proposed tree stream classification algorithm based on k-best tree patterns has certain advantages in the classification accuracy and efficiency compared to previous algorithm. |