Font Size: a A A

Research On Online Learning Methods For Complex Data Streams

Posted on:2024-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:S D ZhuoFull Text:PDF
GTID:2568307067473494Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology,the data scale has gradually entered the Yotta Byte(YB)era from Peta Byte(PB).Effectively organizing and processing massive data,extracting useful information and knowledge from them,and predicting future behavior are the key issues in data analysis and processing.It is obvious that human daily life is inseparable from data interaction,such as social platforms,entertainment,and financial transactions,which not only involve a large number of participants but also generate data in the form of mixed data streams.This paper addresses the above challenges by first using Gaussian models to construct correlations among multiple variables and studying how to establish connections between different variables.Secondly,combining the geometric distribution structure of data space,the paper investigates how to learn the node distribution space under mixed data.Finally,considering the anomalies in mixed data space,the paper studies non-deep structured models to find perturbed sample points that affect model performance.The main research content and innovative results of this paper include:(1)Based on the Gaussian Copula(GC)model and hidden feature space analysis,a missing data processing model and latent feature learning model for mixed data streams are constructed to achieve the representation learning of mixed missing feature data.To effectively fill in missing features and learn mixed data streams,the paper uses a processing method based on the probability correlation of multivariate feature distributions.Firstly,a GC-based multivariate distribution learning model is adopted to establish the correlation between the missing observation space and the latent continuous space,and the missing variables are filled in through variable space mapping.Secondly,an integrated model of data representation learning in different variable spaces is established through a self-adaptive weight model.(2)Based on the self-training semi-supervised classification model,the paper effectively overcomes the problems of blind self-training,local optimal solutions,inability to handle nonconvex distribution data,and erroneous labeling,and solves the problem of insufficient labels in online mixed data streams.To address the issues of long time and high cost in collecting complete labels in online data streams,the paper studies the local density peak model,which can learn the geometric structure between nodes in mixed data streams,construct clusters of different categories by calculating the distance and density values of different nodes,and create corresponding pseudo-labels for missing labels.(3)A perturbed sample point analysis model based on deep character trees is proposed to effectively solve the impact of perturbation factors on the model and explain the impact of perturbation factors on the model.First,a semi-parametric deep hierarchical structure tree model is constructed,which has the learning ability of a deep network model and the model resistance ability that a deep network model does not have.Second,the deep hierarchical structure has a clear reasoning path,which can achieve pixel-level perturbation factor explanation,addressing the current lack of interpretability in online perturbation defense models.
Keywords/Search Tags:Online Learning, Mixed Data Streams, Gaussian Copula, Semi Supervised Classification, Deep Symbol Trees
PDF Full Text Request
Related items