Font Size: a A A

Research On Feature Selection Algorithm Based On Data Similarity

Posted on:2019-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:Q J TuoFull Text:PDF
GTID:2428330545485540Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
In the internet era,the growing data present the characteristics of a large number of samples,high feature dimensions,and complex class structure.Feature selection can extract useful information from massive complex data and has become a hot topic in machine learning and data mining.In this dissertation,we propose three feature selection algorithms by exploring the connection among data from three perspectives of data samples,features,and classes,respectively.Its main content is reflected in the following three aspects:1.From the view of data samples,we propose a feature selection algorithm based on double samples similarity.First,the similarity matrix of data samples is structured by pairwise-distance of samples and reconstruction coefficient of the nearest neighbor samples,and the low dimensional space is constructed.Then,the norm is introduced to the low dimensional space and the feature weight matrix is obtained.Finally,we define the evaluation indicator to measure the features importance to select the optimal feature subset.2.From the view of data features,we propose a feature selection algorithm based on the similarity of reconfiguration features.First,we use the method of feature reconstruction to obtain feature similarity matrix,and the original sample space is transformed on the basis of it.Then,the sample space after the transformation is fitted to the label space under the condition of minimum empirical error.Finally,we optimize and update the feature weight matrix and use it to realize feature selection.3.From the view of data classes,we propose a feature selection algorithm based on the similarity of the nearest neighbor classes.First,to obtain the class similarity matrix,we use the parent-child relationship between classes to model hierarchical structure among the nearest neighbor classes.Then,we use the class similarity matrix to get relevant information of the nearest neighbor classes,and it can update the parameter of the current class.Finally,the feature weight matrix is obtained to select the best feature subset.
Keywords/Search Tags:similarity matrix, lr,p-norm, feature selection, clustering, classification
PDF Full Text Request
Related items