Font Size: a A A

Research & Application Of Algorithm Of KDD Based On GA

Posted on:2006-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ZhaoFull Text:PDF
GTID:2156360152493543Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of importance of data in our daily decisions, people will deeply thirst for better methods of dealing data, which can deal these datas in deeper levels and obtain the global characters as well as the prediction of development trends on datasets. Since accumulated data are explosive increasing, the existed algorithms in KDD have the limitations and insufficiencies, so for satisfying the needs of this area we must take actions to mend existed algorithms or to create new ones.Genetic algorithms that simulate natural optimization processes is a global searching method in the problem space with many advantages, such as simplify , universality , robustiousness and the ability of parallel dealing. GA provide novel computing model for these complicated problems that can not be solved or hardly solved for other technology. These days, as datas in dealing databases are skyrocketing, the global time of scanning the whole database will become longer and longer. It leads to lower and lower efficiency. It is necessary to find novel algorithm to mine relational regulations. In this article, ARMGA algorithm have been constructed and applicated to the accidential database in mines. Through simple computation and analysis, we find that the results are good.Decision tree algorithms have been widely recognized as one of the most efficient ways to find valuable patterns in large data sets for data mining applications. However, scalability and accuracy have become major burdens in large-scale data mining with respect to decision tree algorithms. In order to construct high-quality decision trees and mine useful rules within a limited and reasonable amount of computing capacity, in this dissertation we propose a new approach(called DT_GA) that integrates statistical sampling, genetic algorithm and decision tree algorithms to generate high-quality decision trees. Our approach is a powerful algorithmthat can be used to improve the quality of traditional decision tree algorithms.In this paper, the new crossover and mutation operations have been defined, at the same time classification accuracy of decision trees on test dataset is used as their fitnesses respectively. We conduct some computational experiments and analyze them on datasets of mine accidents to test the performance of DT_GA( with respect to classification accuracy and average computing speed). We find that our approach achieves the same level of classification accuracy as a standard decsion tree algorithm at lower sampling levels. Regardless of the quality of the starting trees, our approach produces uniformly highly accurate decision rules . Our approach is likely to scale well and effective for large-scale data mining. To suit to more actual demands, this article make some suggestions about improving DTGA and provide rough description of reconstructive CAMM algorithm.
Keywords/Search Tags:knowledge discovery in databases, genetic algorithm, association rules, decision tree, sampling technique
PDF Full Text Request
Related items