Font Size: a A A

Research On Bayesian Classification Algorithm Based On Emerging Pattern For Variable Data Stream

Posted on:2023-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:J W LiuFull Text:PDF
GTID:2568306845469324Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Data streams in real applications often contain a lot of redundant information or noise.pattern based data stream classification can reduce the impact of noise and concept drift,and frequent patterns usually contain more information than a single attribute.It is an effective solution to improve the classification performance.Charm is an efficient batch miner for frequent closed patterns.Incmine uses charm as a batch miner to maintain a sliding window of multi batch frequent patterns.It is a data stream approximate frequent closed itemset mining algorithm.However,traditional frequent pattern mining algorithms such as charm and incmine can only process transaction data sets,and the mined frequent patterns have no class marks and cannot be directly involved in the classification of training samples.This thesis studies Bayesian classification algorithm based on variable data stream emerging patterns.The main research contents include:(1)mining frequent closed patterns with class constraints and converting them into training samples;(2)Concept drift detection method of variable data stream based on pattern mining;(3)Bayesian classification algorithm based on data stream emerging patterns.In order to achieve the research goal,this thesis improves the main frequent closed itemset mining algorithms such as charm and incmine,which are not suitable for batch or incremental classification of actual data.The specific work is as follows:(1)improve the charm algorithm for batch mining frequent closed itemsets,increase the preprocessing function of the algorithm,and convert the relational phenotypic data into itemsets that can be processed by charm.The frequent closed itemsets mined by the improved charm algorithm are restored to training samples for classification.(2)Using the small batch frequent closed itemsets mined by charm,update the semi FCIs(semi frequent closed itemsets)in each node of the incmine sliding window,and monitor whether the number of semi FCIs between nodes has changed significantly.If the concept drift occurs in the data flow,corresponding treatment measures shall be taken.(3)emerging mode must be frequent closing mode.For the frequent closed itemsets mined by the algorithm,the emerging patterns required for classification can be further obtained.This thesis studies the combination of emerging patterns and training data,and explores ways to improve the classification accuracy of pattern based Bayesian algorithm.The experiment uses several real and simulated data streams,and the classification objects cover various combinations of mining frequent patterns and original data sets.Experiments are conducted to analyze and verify the improved charm and incmine algorithms after preprocessing,and the Bayesian classification algorithm EPBM based on data stream emerging pattern proposed in this thesis.The experimental results show that the improved charm and incmine algorithms can completely break through the integer type restriction of transaction data,mine the frequent patterns in actual data sets or data streams,and have the function of concept drift detection.compared with other similar algorithms,the emerging pattern mining method EPBM based on incmine has significantly improved the accuracy or effectively reduced the classification time,and the classification performance is better than other similar algorithms.
Keywords/Search Tags:Data stream, Bayesian classification, Emerging patterns, Data mining, Concept drift
PDF Full Text Request
Related items