Font Size: a A A

Research On Lazy Association Classification Algorithm For Big Data

Posted on:2016-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:H M YangFull Text:PDF
GTID:2308330479984824Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Association classification algorithm is favored by researchers and engineers because of its high precisionand good expansibility. Association classification algorithm includes explicit association classification algorithm and lazy association classification algorithm. When facing with big data, explicit association classification algorithm cannot solve the problems of too large candidate association rules and small disjunction. Lazy association classification algorithm can solve both problems mentioned above, because lazy association classification algorithm projects the unclassified samples to training set, making the training set smaller and more related with the unclassified samples.However, lazy association classification algorithm has problems facing big data. Lazy association classification algorithm has very low efficiency for multiple unclassified samples, because it must build one classifier for one single sample. On big data, lazy association classification can use C-DMA algorithm to raise efficiency, but the projection of unclassified sample is still serial. Map Reduce is not fit well with iterative calculation, so need to find a new framework to implement lazy sssociation classification algorithm to accommodate big data environment.Hence, this paper proposed a distributed lazy association classification algorithm based on Spark- SDLAC algorithm, with expect of finding a way to solve classcification problem on big data. SDLAC algorithm solved the low efficiency problem of lazy association classcification for multiple unclassified samples using aggregate method. SDLAC algorithm implemente distributed projection operation when use C-DMA algorithm to dig association rules. SDLAC algorithm use Spark framework instand of Map Reduce framework. Analysis and Experimental show that SDLAC algorithm is better than CBA algorithm in accuracy, and with almost the same accuracy of lazy association classcification algorithm. Besides, SDLAC algorithm has far more higher efficiency than CLAC algorithm. Thus, SDLAC algorithm is fit with big data environment. So we can conclude as below.â‘ Find out the problems of lazy association classcification algorithm when classifying on big data.â‘¡Propose ACLAC algorithm, applying C-DMA algorithm to lazy association classcification algorithm, using aggregate method and distributed projection operation and Spark framework to raise efficiency further.â‘¢Analysis and Experimental results show that SDLAC algorithm is better than CBA algorithm in accuracy, and with almost the same accuracy of lazy association classcification algorithm. Besides, SDLAC algorithm has far more higher efficiency than CLAC algorithm. Thus, SDLAC algorithm is fit with big data environment.
Keywords/Search Tags:aggregate method, distributed projection, Spark framework, C-DMA algorithm, lazy method, association classification
PDF Full Text Request
Related items