Research On Feature Selection Algorithm Based On Top-r Method

Posted on:2015-04-15

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhao

Full Text:PDF

GTID:2428330488499868

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the continuous development of the information technology and constant improvement of information obtaining ability,people often need to analyze and deal with various high dimensional data,such as the mass of web data,remote sensing images,microarray data etc.These high dimensional data usually lead to the exponential increase of the calculation of machine learning algorithm,causing “the curse of dimensionality”.Therefore,feature selection technology for high dimensional data has become an important subject in the field of data mining.Feature selection technique maps high dimensional data from high-dimensional space to lowdimensional space which can better reflect the essential meaning of the data object and improve the efficiency of data analyzing and processing simultaneously.The paper has a deep research and discussion on the theoretical idea and practical application of feature selection technique for high dimensional data by taking the microarray data as the experimental data.A new feature selection algorithm based on feature similarity is proposed.Firstly,normalized signal to noise ratio algorithm is used for removing irrelevant features.Then surplus features are clustered into several clusters and clusters which only have little features are removed as noise features.After the removal,k clusters will be left,and the intra-cluster features redundancy is high,while features redundancy between the clusters is low.Finally,each feature of the clusters is evaluated successively according to the evaluation criteria which is proposed in this paper to decide whether or not to remove.In this way,the rest would be assembled and sorted according to the individual classification ability.The experiment confirms that the algorithm is valid in removing irrelevant features,noise features and redundancy features.A new algorithm can be achieved by analyzing the advantages and disadvantages of the feature selection algorithm based on feature similarity and the Top-r feature selection algorithm,combing them and learning from each other,which can not only fully consider the classification advantages but also guarantee high execution efficiency.First of all,feature set is cut for getting a feature subset with little irrelevant feature and redundancy features,then features in the same clusters are divided into different blocks and features in different clusters into the same.Finally,feature blocks are processed with Top-r algorithm to choose the optimum feature subset.The experiment confirms that the new algorithm can not only select the superior feature subset,but also guarantee high execution efficiency,fully affirming its superiority.

Keywords/Search Tags:

Feature Selection, Top-r Algorithm, Data Mining, Machine Learning

PDF Full Text Request

Related items

1	Research On Application Of Machine Learning And Data Mining In Bioinformatics
2	A Study On Feature Selection Algorithms Using Information Entropy
3	Research On Dynamic Feature Selection Algorithm For Flow Features
4	Studies Of Several Mathematical Models And Algorithms In Data Mining
5	Research On Machine Learning Algorithm With Environmental Data Prediction
6	Research And Application Of Integrated Feature Selection Algorithm Based On Extreme Learning Machine
7	Research On Model Selection For Machine Learning
8	Analysing Correctness Of Implementations Of Machine Learning Algorithms By Machine Learning
9	Mathematical programming approaches to machine learning and data mining
10	Data Mining And Feature Selection Of High Dimensional Biomedical Data Based On TCGA And Pubmed Databases