Font Size: a A A

Research On The Classification Of Data Stream With Concept Drift Based On Cosine Similarity

Posted on:2018-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:2348330515957958Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the demand of data mining technology deepening,such as real-time monitoring,network intrusion detection,spam processing,intelligent information pushing,data mining technology was gradually developed from the initial analysis of static and finite data to today’s analysis of dynamic and infinite data.The data stream classification has become a research hotspot in data mining.The data stream is the dynamic data arriving in real time in the form of stream,which has the characteristics of large amount of data,fast arrival,and continuous real time arrival.As the concept drift occurs in the data stream,the performance of the classifier is degraded.Therefore,the concept drift detection of the data stream is carried out,and an effective detection concept drift method is proposed in this thesis,based on the analysis of the characteristics of the data stream.Then the data after the detection is classified,and a data stream selection ensemble classification method is proposed to improve the classification performance of data stream.The specific contents of this thesis are as follows:(1)A concept drift detection algorithm based on cosine similarity is proposed for the problem of concept drift in data stream.The proposed algorithm first uses the sliding window principle to treat the data stream as continuous data blocks with the same size,and the centroids of data blocks in the various types are calculated.And then the cosine similarity of the adjacent two kinds of centroids is calculated.The larger the cosine similarity is,the smaller the angle of the centroids of the two data blocks is,the smaller the possibility of the drift of the adjacent two data blocks is.On the other hand,the larger the angle between the adjacent two blocks is,the greater the likelihood that the adjacent two data blocks will drift.Finally,the minimum confidence interval of the cosine similarity is obtained according to the method of parameter estimation.If the subsequent data block is not within the confidence interval,the cosine similarity of the previous data block is considered to be concept drift in the current block.Experiments show that the concept drift detection algorithm based on cosine similarity can effectively detect the concept drift on the data stream,thus improving the accuracy of data stream classification.(2)A classification algorithm based on differential evolution is proposed to solve the classification problem of data stream.First,the data stream is divided into consecutive data blocks of equal size,and the current data block is used to train a number of base classifiers.Then the differential evolution method is used to assign different weights to each baseclassifier.The higher the weight of the base classifier is,the better the performance is in the classification.Finally,the several base classifiers with the highest weights are selected to perform weighted voting integration,and the ensemble classification model is used to classify the data blocks.The experimental results that the selective ensemble classification method based on differential evolution has the advantages of stability,strong generalization and high classification accuracy.
Keywords/Search Tags:Data stream, Concept drift, Cosine similarity, Differential evolution, Ensemble classification
PDF Full Text Request
Related items