Font Size: a A A

Online Learning For Group Variable Selection

Posted on:2022-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:N J ZhengFull Text:PDF
GTID:2480306521467034Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology,the collection,storage and use of high-dimensional data and massive data play an increasingly important role in the field of scientific research.High dimensional data usually has the following characteristics: first,high-dimensional data is sparse in nature and has a group structure.Although the data has higher dimensions,But only a small part of the dimension data may play a role in the characteristics in the form of groups;secondly,high-dimensional data is often generated dynamically in real life,and new data may flow into the data set to be excavated;third,highdimensional data will contain sensitive information,Large-scale data collected by people will contain a lot of personal privacy.Traditional machine learning algorithm is inefficient in dealing with large-scale high-dimensional data,and online learning is one of the methods that people use this kind of data efficiently in recent years.At the same time,how to protect the privacy of individuals while using such data is also a concern.Based on this,this thesis focuses on online group structure learning and privacy protection.Specifically,it includes the following two parts:First,the online group variable selection problem for high-dimensional streaming data is studied.An online estimation method of logistic regression with group lasso penalty is proposed,and the GFTPRL algorithm is given.By giving the regret bound of the algorithm,it is proved that the algorithm is effective in theory,The accuracy of our algorithm is better than other mainstream sparse online algorithms.Second,based on the concept of differential privacy,online group lasso learning is studied.DP-GFTPRL algorithm,which is the solution algorithm of online group lasso model with differential privacy,is proposed to solve the binary classification problem of logistic regression,According to the properties of differential privacy and online learning theory,the expected regret bound of the algorithm is proved.Finally,the usability of our differentially private online group lasso algorithm is proved by experiments.
Keywords/Search Tags:Online learning, Group Lasso, Differential privacy, Sparsity, Logisitic regression
PDF Full Text Request
Related items