Font Size: a A A

Methods And Applications For Robust Feature Representation Learning

Posted on:2020-04-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y C QianFull Text:PDF
GTID:1367330620452328Subject:Statistics
Abstract/Summary:PDF Full Text Request
In recent years,the topic of big data has received much attention.Big data is not limited to the amount of data,but the data value hidden behind the data.How to mine the laws existing in big data and provide people with valuable information is one of the challenges in the field of big data research.In order to make accurate inference and prediction on a given data sample,it is necessary to find the appropriate feature representation of the data to effectively model the underlying structure of the data.The model should reflect concise global structure,capture the performance of the data,and be robust to noise.The premise of seeking feature representation of data is that most of the data in the real world has their own rich and unique structure,and if the data distribution is arbitrary,then feature representation learning will not be feasible.At the same time,the data sampled in reality is always limited and usually contains noise,which requires how to select and design appropriate models and regularization techniques.In this thesis,by combining graph embedding,low rank analysis,self-representation learning,intra-class and inter-class relationship methods,and describing the sample relationship as the core,two unsupervised feature representation learning methods and a supervised feature representation learning method are proposed.It is used in simulation data,image data and biological data.Compared with the most advanced methods,the validity of the feature representation method proposed in this paper is verified.The main work of this paper includes the following aspects:1.A method named Low-Rank Graph Optimization for Multi-View Dimensionality Reduction is proposed.Graph-based dimensionality reduction methods have received extensive attention and applied in tasks such as classification and clustering.However,most of these methods are only applicable to data in a single view.Although researchers have proposed various dimensionality reduction algorithms based on multi-view data,the graph construction strategy used in them do not fully take into account the different importance of noise and multiple views,which will greatly reduce the performance of the algorithm.LGRO-MVDR method first constructs a similarity matrix from each view data,and then constructs a low-rank shared matrix based on multiple similarity matrices,as well as a sparse error matrix corresponding to each view representing potential noise.Second,we introduce adaptive non-negative weight vectors to explore the complementarity among different views.Furthermore,an effective optimization strategy based on Alternating Direction Method of Multipliers is proposed.Finally,based on the low-rank shared matrix,the dimension of the data is reduced by using the graph embedding technology,and the new representation of the data features is obtained.2.A method named Robust Inner Product Regularized Unsupervised Feature Se-lection is proposed.The model describes the similarity relationship between samples by self-representation learning,constructs a spectral clustering model based on sample sim-ilarity relationship and sample label indication vector,and combines self-representation learning,spectral clustering and feature selection into a unified framework.In this way,RIRUFS can well reveal the underlying multi-subspace structure of the data,and iteratively learn the optimal similarity matrix and label matrix.Second,by introducing the inner product regularization term into the objective function,the features we select have independence and low redundancy.In addition,an effective iterative updating optimization algorithm is proposed to solve the RIRUFS model.The feature selection matrix obtained by this model can reflect the feature importance of the data.Therefore,feature selection according to the degree of importance can ignore the features and noise that have little impact on the clustering performance and play a robust role on the noise.3.A method named Class-Specific guided Local Feature Selection is proposed.The model is derived from the region of each class of high-dimensional data that has a unique subset of optimal discriminant features.Existing methods simply select a common subset of features for all classes to represent high-dimensional data.In the CSGLFS method,feature subsets learn local changes,so that high-dimensional data can more clearly describe the relationship between the intra-class samples and the inter-class samples in the projection space corresponding to the optimal feature subsets.We also have a weak classifier suitable for this method to describe the similarity between the test data and each class,and to classify the test data more accurately.In addition,our CSGLFS method is effectively expressed as a linear programming problem,greatly simplifying the solution process.The over-fitting problem of the model is discussed by observing the number of selected features.For features irrelevant to the classification problem,the probability of selecting such features will be very unsatisfactory,and the classification accuracy will reach a stable value with the increase of the dimension.
Keywords/Search Tags:Feature Representation, Graph Embedding, Low-Rank Optimization, Inner Product Regularization, Class-Specific Local Features
PDF Full Text Request
Related items