Font Size: a A A

Dimensional Reduction And Its Application Based On Laplacian-principal Component Analysis

Posted on:2019-06-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ZhaoFull Text:PDF
GTID:2347330569989342Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Feature selection,as a way to reduce dimension,is intended to remove some un-correlated features and redundant features as well as noise characteristics,to selecte a subset of related features to represent the original features structure.This paper compares some well-known feature selection methods,maximum variance,Laplacian score and SPEC,we find that they all based on one of criteria independently evalu-ate the performance of each feature.This paper proposes a new method,PCA-LAP,to explore the data.The primary component analysis is induced as a method to preprocess the original data set,then the Self-help method is used to divided the data into training sets and testing sets on the basis of principal component analysis.There after the Lapliacian score is exploited to calculate feature scores of training data,the distinguished features are selected according to the score.And the number of the selected features are determined by the combination of the evaluation indexes of Rand and classError,the testing data as well as hierarchical clustering.Self-help sampling method is exploited to the six data sets-Madelon,winequality-Red,ISOLET1,ZOO,COIL20,USPS,to partition the data into training data and testing data.Some datamining methods,such as maximum variance,Laplacian s-core and SPEC methods are utilized to the training data sets from the six data sets to identify the distinguished features,and then we use the results of feature selec-tion to hierarchical clustering on their corresponding testing data.The evaluation methods,Rand index and classError,are used to compare the accuracy of these algorithms on these six data sets.In the same way,the PCA-LAP method was used on selecting features and on hierarchical clustering.The comparison of Rand and classError illustrated the superiority of the proposed algorithm.
Keywords/Search Tags:Feature selection, Lapliacian score, Principal component analysis, Redundant feature, Hierarchical clustering
PDF Full Text Request
Related items