Font Size: a A A

Unsupervised Feature Selection Algorithm And Its Application In Gene Data Analysis

Posted on:2018-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y DingFull Text:PDF
GTID:2354330542962927Subject:Engineering
Abstract/Summary:PDF Full Text Request
Cancer can't make substantive breakthrough in human medicine for a long time,the reason is that human cannot be found in the piece of virulence genes culprit,with the rapid development of computer science and technology,using DNA microarray technology to break through the bottleneck.At the same time,facing some challenges,one is the most of those features are irrelevant or redundant,lead to the highly complex data processing;The second is the number of valid samples is rare,and brought some obstacles for data analysis.According to these characteristics,this paper using feature selection algorithm for data processing and analysis,to ensure the reliability and accuracy of the results.Feature selection algorithm is one of the important methods in data preprocessing in classification task,and is widely used in medical,image and text data processing field.Feature selection algorithm can be divided into two categories,respectively is supervised feature selection algorithm and unsupervised feature selection algorithm,the difference is the former with the label information,while the unsupervised feature selection algorithm does not.Because there are a large of label information is missing in reality.In this paper,the type of unsupervised feature selection algorithm study are as follows:(1)Unsupervised feature selection algorithms based on density peaks is applied in the gene dataset.First,divided dataset with 10-fold cross-validation,then calculated correlation between feature and feature in train dataset,next chosen gene subset using the algorithm.Finally,the SVM and KNN classifier training model,we evaluate the quality of the selected gene subset by the performance.The experimental results show that the algorithm of gene dataset has a good applicability.(2)Due to the quality of the selected gene subset and distance measurement is directly related,this chapter will use four different distance metrics to compute representativeness and discriminability of gene of the third chapter defined,and proposed a new significance of the feature which pay attention to representativeness of feature.According to compare the average accuracy,mcc,sensitivity,specificity of selected gene subset on three dataset,we can see that the significance of gene is effective.
Keywords/Search Tags:density peaks, unsupervised feature selection, support vector machine
PDF Full Text Request
Related items