Font Size: a A A

Decision Tree Classification Algorithm Parallelization And Its Application

Posted on:2011-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:X Y XingFull Text:PDF
GTID:2208360308471744Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with using management information system for industrial,commercial,financial and information retrieval using the enormous variety of data sets for the algorithm development, data mining has become a major driving force for research in the field. Iincreasing of datasets, how to deal with huge amounts of data become a difficulty. In the data processing There are two ways to solve this problem: One way is to reduce the response time of sampling. However, in some cases ,reduce the training datasets may lead to inaccurate calculation model, more seriously, the mining model will be not use, for example contour recognition, identification of abnormal point; Another method is to parallel computing methods. Parallel computing method is relative to the serial method. the so-called parallel computing includes the parallel in space and the parallel in time. The parallel in space refers to a number of processors also perform calculations at the same time, and the parallel in time refers to pipelining.Parallel processing technology and data mining technology are combined tightly in the thesis.It will be able to provide a best research for mining massive datasets.Among the classification algorithm, decision trees have some advantage ,suan as excellent for data analysis efficiency, it is robust in eliminate noise, easy to understand the classification easy to show important decision property and so on.It is a classification that belong to a typical logical output model of data mining. Although parallel decision trees algorithms have been proposed, but there is excessive communication, data rational, load imbalance and poor of scalability, appear that with the datasets increase,the algorithms Performance has declining.To solve the existing problems about Parallel decision tree,the parallel computing environment has been designed, analysis the parallel decision trees, and discusses the parallel decision trees construction program..In the paper parallel strategies for training decision tree, analysis these strategies, this paper select the horizontal data partitioning method ,it can guarantee a high accuracy of decision trees. ID3 parallel algorithm is designed and implemented, to improve the performance of programs to reduce the time complexity has good nature. Finally, the ID3 parallel algorithm is applied instance of student test scores, get the decision tree model and decision rules.
Keywords/Search Tags:data mining, parallel computing, classification, parallel decision trees, decision rules
PDF Full Text Request
Related items