Font Size: a A A

Research And Distributed Implementation Of Stream Classification Based On Concept Drift

Posted on:2019-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y B N OuFull Text:PDF
GTID:2348330545455621Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet of things and mobile Internet,data are continuously generated from many places.These data formate data stream which is different from dataset highly.Data stream is a fast,massive and dynamic data sequence.Because of dynamic changes and varity of changes in data stream,the mining algorithms from streams are more diffcult to obtain the correct predicted values.Data stream classifications running on single machine can't tackle the computational hardware challenges efficiently.How to adapt to dynamic data stream timely is one of the hotest topic in data stream mining.Data stream classification is aplying in many industrial fields,such as network log analysis,credit card fraud detection,intrusion detection.Concept drift means changes of the pattern encoded in the stream over time and it is a pervasive phenomenon.If classifiers do not detect and deal with the concept drift in time,the performance will be deteriorated.However,existing data stream classifications based on concept drift only adapt to one certain concept drift,while other types of concept drift won't be able to be detected and processed promptly.In this paper,we study concept drifts and adaptive stream classification.Most of data stream classifications enhance performance through optimizing the structure of data and algorithm.However,these algorithms running on single machine can't tackle computational hardware processing capabilities.Scalable stream classifications solve this challenge through computing distributedly and storing in machine cluster.In this paper,we study scalable stream classification and it's distributed implementation.The main research work of this paper as follows:1.This paper studies the basic concepts of Bayesian inference,and discusses the importance of prior information to Bayesian inference.And we propose a parameter estimation which is based on conjugate Dirichlet prior and corresponding stream classification.2.We quote an new concept drift detection method LFR(Linear Four Rates)to adapt to many types of concept drifts.Also,we modify this framwork for improving classification performance.3.Finally,we study the distributed implementation of stream Bayes classification and distributed solution on Flink.Also,we select the most suitable parallel way according to the characteristic of data.Then we design the data structure which is stored in Redis.Lastly,we optimize the distributed algorithm based on Flink.Experiment results on synthetic datasets and real datasets show that the proposed adaptive data stream classification algorithm ADIB(Adaptive Dirichlet Incremental Bayes)detects concept drift more quickly than other concept drift detection.Through comparing the distributed algorithm with algorithm running on single machine,the proposed distributed algorithm reduces the execution time and enhance the throughput significantly.And the result indicates the feasibility and effectiveness of the distributed algorithm.
Keywords/Search Tags:Concept Drift, Stream classification, conjugate Dirichlet prior, Flink, Redis
PDF Full Text Request
Related items