Font Size: a A A

Research On Active Learning Mining Method For Streaming Data

Posted on:2024-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:K Y ZhangFull Text:PDF
GTID:2568307115479654Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the era of rapid development of information technology,a large amount of data is generated in the form of streams.Analyzing and mining streaming data has become a field of high integration of big data and artificial intelligence.Data stream classification is the task of mining valuable knowledge from stream and classifying the latest samples.In real scenarios,affected by generation speed and labeling cost,unlabeled samples widely present in streams,which bring great challenge to classification.Active learning is one of the effective methods to solve label scarcity.This learning obtains label of samples through query strategy.On the premise of obtaining a positive classifier,the objective of active learning is to reduce the budget of annotation as much as possible,and the key lies in that the strategy can effectively select valuable samples.However,this learning has the risk of poor performance or failure,mainly due to the dynamic evolution phenomenon such as concept drift in the streaming environment.The main work of this thesis includes using active learning to design query strategies for streaming data,and proposing adaptive classification mining models for concept drift and noise in streams.The related work is as follows:(1)In order to solve the problem that concept drift affects the stability of query strategy,this thesis proposes an online active learning method based on concept drift detection from the perspective of the significant change of sample features.This method constructs a sample set to represent the latest concept in the current stream by the significance degree of concept drift,and then adopts a hybrid query strategy to select valuable sample in the set and incrementally update the classifier,which effectively improves the sampling stability of active learning in the streaming environment.(2)Using clustering to select representative samples is a common strategy of active learning.However,most clustering methods have the problems of poor effect and inability to effectively reflect the sample distribution in dynamic streaming environment.In order to solve this dilemma,this thesis proposes an online active learning method based on density peak.Firstly,the representative degree of samples in local space is determined by asymmetric neighbor relationship.Then,the influence range is divided for the samples with high representational degree.Based on the range,sampling is realized in the dense distribution space.Finally,sampling in sparse distribution space is realized by identifying and saving cluster fragments.This method provides an effective and reliable strategy for online active learning to select representative samples.(3)Query-by-committee is a commonly used active learning strategy,the key is to construct diversity committee.This thesis proposes an online active learning method based on ensemble learning which combines ensemble learning with query-by-committee.This approach provides an ensemble building strategy that combines diversity and cooperativity.In order to solve the problem of label scarcity in streams,the sampling strategy based on diversity entropy is explored on the basis of ensemble architecture.This strategy focuses on the samples that are difficult for diversity committee to distinguish categories,which ensures a high classification accuracy of the ensemble and effectively reduces the labeling budget.In this thesis,we propose a series of feasible solutions to solve the urgent problems such as label scarcity,concept drift,model adaptation,and construct complete stream data mining frameworks from multiple aspects.This work has important theoretical significance and practical value for the classification mining of dynamic streaming data with label scarcity.
Keywords/Search Tags:streaming data, classification mining model, active learning, concept drift, clustering peak, ensemble learning
PDF Full Text Request
Related items