Font Size: a A A

Research On Dynamic Learning For Data Stream Bayesian Classification

Posted on:2019-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:H T LiFull Text:PDF
GTID:2428330545972098Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Many real applications,such as large-scale sensors,information monitoring systems,search engines and social media,generate continuously arriving data,known as data streams.How to monitor,analyze and further mine these massive data is a hot research in the current data mining field.Data stream appears as a sequence and may contain an infinite number of instances,which has the characteristics of high speed,mass and dynamic change over time.Therefore,in order to learn the data stream,it is necessary to perform one scan,fast process,and dynamic update to adapt the data stream.It is one of the most important research issues to build an adaptive classifier on dynamically changing data stream in the current data mining field.At present,many data stream mining methods all assume that the underlying feature space of the data stream is static.However,many real-life application scenarios can not avoid the occurrence of feature drift.For example,the relation between general features and target concepts may change over time,which will lead to the change of concept distribution.In addition,the dynamic change of the underlying feature space may also cause the original related feature attributes to become irrelevant or even disappear,or lead the original disappeared concept to reappear.In such a dynamic environment,it is required to be able to dynamically track these changes of the data stream and be able to update itself adaptively for a classifier.Therefore,this paper studies the feature drift of data stream and the concept evolution which caused by the change of underlying feature space,the main work includes:(1)As for the problem of feature drift,a dynamic feature weighted Bayesian classification algorithm of data stream is proposed.The main idea is that the relation between general features and target concepts may change over time.In order to track this change and apply it to the prediction of new instances,we use the gain ratio to select important attributes,and dynamically update feature weights in the classification process to improve the classification accuracy.The results of the experiments show that the proposed algorithm has improved classification accuracy compared with other single classification algorithms.(2)As for the concept evolution caused by the dynamic feature space of data stream,an ensemble classification algorithm based on Bayesian is proposed.The algorithm continuously stores some basic learning models and uses dynamic feature space to represent different concepts in the data stream.For each basic classification model,only the most predictive features are retained.The basic classification model is stored as a concept vector in the model repository and adjusted according to the dynamic change of the data stream.On this basis,we select some appropriate basic models to form an ensemble classifier that can be used to predict new incoming unlabeled instances.The results of the experiments show that our ensemble algorithm has better performance on most data sets than other ensemble algorithms.
Keywords/Search Tags:Data Stream, Feature Drift, Dynamic Weighting, Bayesian, Ensemble learning
PDF Full Text Request
Related items