Font Size: a A A

Study On Online Ensemble Techniques With Distributed Adaptivity

Posted on:2024-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:B Q FengFull Text:PDF
GTID:2568307154996329Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Data stream is a common form of data in many application scenarios,which is characterized by high speed,continuity,dynamic property and massive,which creates many challenges for data processing.One of the major challenges is concept drift,that is,the statistical properties of a target domain change over time in an arbitrary way.In classification problems,concept drift may lead to classification model failure or performance degradation.In order to deal with concept drift,adaptive integration techniques are proposed and widely used,which combine multiple models to improve the accuracy and robustness of predictions and decisions.In the dynamic environment,this technology is able to achieve adaptive through continuous learning,while dynamically adjusting the model parameters and algorithm combinations,thus ensuring the accuracy and stability of the model.To this end,this thesis studies the adaptive ensemble classification algorithms in the data stream,improves the existing integration methods,and proposes the new ensemble algorithm DME(Distribution Matching Ensemble)with adaptivity and the ensemble algorithm DMEIL(Distribution Matching Ensemble for Imbalance Learning)for imbalanced data streams.Firstly,this thesis aims at the problem that the existing ensemble algorithm based on data block learning ignores the information of the data block in the training process,proposing an ensemble algorithm based on distribution similarity to adaptive concept drift,known as DME algorithm.DME estimates the distribution of each received data block by Gaussian mixture model(GMM)and reserves the corresponding distribution information,as well it maintains a group of classifiers in a buffer.When we receive a new data block which is required to be predicted,the similarity between its distribution and each reserved distribution will be calculated by Kullback-Leibler(KL)divergence,and then the similarities can be used to guide the weight assignment of each corresponding classifier to further make adaptive ensemble decision.DME gets rid of the underlying hypothesis that the most recent labeled data block always has the most similar distribution with the current unlabeled data block.In addition,to avoid infinite extension of ensemble buffer during incremental learning,we also develop two dynamic classifier update rules.Experiments results on some synthetic and real-world streaming datasets show that the proposed DME algorithm is able to track and adapt to various types of concept drift just in time.Especially,on data stream with frequent reoccurring drifts,the DME significantly outperforms to several state-of-the-art algorithms,indicating its superiority.Secondly,this thesis proposes a weighted integration algorithm DMEIL that can be used for unbalanced data streams based on distribution adaptation.In imbalanced datasets,minority classes tend to have more information,thus the DMEIL algorithm focuses on the distribution of minority classes in data blocks,and maintains a limited number of component classifiers in the buffer and keeps information about the minority class distribution.After the new data block obtains the labels,the KL divergence is used by the DMEIL algorithm to measure the similarity of the distribution of the minority classes between data blocks,and update the weight information to further guide the ensemble mode to predict the next data block.In order to update the buffer,the DMEIL algorithm uses the delayed average weight removing rule,which is the same as the DME algorithm.Experimental results on synthetic and real-world datasets show that the proposed DMEIL algorithm is able to track and adapt to concept drift in an imbalanced data stream timely,and offers better performance compared to other comparative algorithms.
Keywords/Search Tags:Data Stream, Adaptive Ensemble Learning, Concept Drift, Class Imbalance Learning, Gaussian Mixture Model, Kullback-Leibler Divergence
PDF Full Text Request
Related items