Selectively Mining Approach With Dynamical Chunk Size For Imbalanced Data Streams In Nonstationary Environment

Posted on:2018-01-21

Degree:Master

Type:Thesis

Country:China

Candidate:N N Liu

Full Text:PDF

GTID:2348330542960095

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Compared with the traditional static data,data streams have the characteristics of real-time,massive,single scanning and dynamic property.In recent years,there are more and more algorithms for data stream classification,while most of them are based on the assumption that the data distribution is balanced or nearly balanced.However,more and more real life applications such as monitoring system fault diagnosis,network intrusion,credit card fraud,telecommunications management,oil spill detection,text classification,the data distribution is imbalance,where misclassifying the minority class often cause great loss.Therefore,how to improve the precision of classification in minority instances without reducing the accuracy of classification in majority instances is a hot and difficult issue in mining imbalanced data streams.In addition,the concept drift is another difficult problem in the research of data stream classification,especially when the concept drift and imbalance are combined,which makes the data stream classification faced greater challenges.At present,most of the proposed ensemble classification algorithm is based on the idea of data block,just like the sliding window,where the performance is too sensitive to the size of window.What’s more,the general assumption that the drift does not exist in a data chunk,which is not consistent with the real data stream.This paper puts forward the selectively approach with dynamical chunk size for mining imbalanced data stream in nonstationary environment,which will be introduced as follows:(1)The algorithms of SMDC:by adding the concept drift detector to adjust the size of current chunk to get the optimal chunk,which ensure that the instances from current chunk is of the same concept,so as to improve the ability of classifiers.In the drift detector,this paper put forward a detection method applied in imbalanced data streams,which is different from using the overall accuracy.It can not only detect the concept drift in both majority instances and minority instances,but also can not be affected by certain noise.In addition,based on the large data processing idea of selectively remain some minority instances,and under-sampling the majority instances without repeated,which avoid the number of minority instances exceed the number of majority instances,we can train the classifiers well and improve the classification accuracy at the same time.We set the experiments to compare the algorithm with other typical algorithms on different datasets,which proves that the algorithm can achieve higher classification accuracy on imbalanced data stream and have good robustness to frequent and fast drifts.(2)The algorithm of SMDCWE:In order to avoid forgetting the important knowledge of old instances and improve the adaptive ability of the algorithm to different types of concept drift,we add the weighting mechanism,where remain the learned classifiers by voting and avoid.Finally,the experiment on synthetic datasets and real dataset proves that the algorithm can achieve higher classification accuracy on imbalanced data stream and can be more sensitive to the concept drift.

Keywords/Search Tags:

imbalanced data streams, concept drift, dynamic chunk, ensemble classifiers, under-sampling

PDF Full Text Request

Related items

1	Research On Classification Algorithms Of Concept Drift And Imbalanced Data Streams
2	Research On Classifiers For Data Streams Based On Active Learning
3	Study Of Mixture Ensemble Classifications For Mining Data Streams With Concept Drift
4	Study On Data Streams Online Classification Algorithm Of Adapting To The Concept-Drift
5	Research On Classification For Data Streams With Concept Drift
6	Research On Concept Drift Detection Algorithms For Data Streams
7	Research On Classification Technologies In Mining Unsteady Data Streams
8	Research On Active Learning Method For Imbalanced Data Streams
9	Research And Application Of Inhibiting The Effects Of Concept Drift Based On Machine Learning
10	Research On Classification Algorithms And Storage Models For Concept Drift And Imbalanced Data Streams