Unsupervised Feature Selection Algorithm And Its Application In Gene Data Analysis

Posted on:2018-05-03

Degree:Master

Type:Thesis

Country:China

Candidate:Y Ding

Full Text:PDF

GTID:2354330542962927

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Cancer can't make substantive breakthrough in human medicine for a long time,the reason is that human cannot be found in the piece of virulence genes culprit,with the rapid development of computer science and technology,using DNA microarray technology to break through the bottleneck.At the same time,facing some challenges,one is the most of those features are irrelevant or redundant,lead to the highly complex data processing;The second is the number of valid samples is rare,and brought some obstacles for data analysis.According to these characteristics,this paper using feature selection algorithm for data processing and analysis,to ensure the reliability and accuracy of the results.Feature selection algorithm is one of the important methods in data preprocessing in classification task,and is widely used in medical,image and text data processing field.Feature selection algorithm can be divided into two categories,respectively is supervised feature selection algorithm and unsupervised feature selection algorithm,the difference is the former with the label information,while the unsupervised feature selection algorithm does not.Because there are a large of label information is missing in reality.In this paper,the type of unsupervised feature selection algorithm study are as follows:(1)Unsupervised feature selection algorithms based on density peaks is applied in the gene dataset.First,divided dataset with 10-fold cross-validation,then calculated correlation between feature and feature in train dataset,next chosen gene subset using the algorithm.Finally,the SVM and KNN classifier training model,we evaluate the quality of the selected gene subset by the performance.The experimental results show that the algorithm of gene dataset has a good applicability.(2)Due to the quality of the selected gene subset and distance measurement is directly related,this chapter will use four different distance metrics to compute representativeness and discriminability of gene of the third chapter defined,and proposed a new significance of the feature which pay attention to representativeness of feature.According to compare the average accuracy,mcc,sensitivity,specificity of selected gene subset on three dataset,we can see that the significance of gene is effective.

Keywords/Search Tags:

density peaks, unsupervised feature selection, support vector machine

PDF Full Text Request

Related items

1	Research And Implementation Of Asthma Diagnosis Model Based On Improved Fuzzy Support Vector Machine
2	Application Of Support Vector Machine In Prediction Of Diabetes Genetic Risk
3	Research On The Image Classification Of Brain Glioma Based On Improved Support Vector Machine
4	Cancer Diagnosis By Using Support Vector Machine
5	Identification Of Alzheimer's Disease Associated Genetic Biomarker Candidates By MKL-SVM With Feature Selection
6	Support Vector Data Description-based Feature Selection Method And Its Application
7	Research On Encephalic Tissue Recognition For MR Image Based On Support Vector Machine
8	Analysis Of Cancer Gene Data Base On Random Forest And Support Vector Machine
9	Gene Feature Selection And Classification Of Cancer Based On Genetic Algorithm And Support Vector Machine
10	Breast Cancer Diagnosis Based On Feature Selection And Support Vector Machine