Font Size: a A A

Research On Feature Selection Of Tumor Genes

Posted on:2019-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:H Y MuFull Text:PDF
GTID:2404330548470112Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The analysis and mining discriminative information of the samples from the gene expression profile has gained great biological significance to reveal the tumor development and the molecular diagnosis of tumor.The samples of gene expression profile recorded all the gene expression level in histiocytic,and in fact,only a few genes related to the informative class labels of samples which contain the most classes informative of samples.Therefore,the high generalization capability and better description gene expression data feature selection model for high dimensions,small samples of gene expression profile.It has great research significance and the application value.The selection of feature genes from the gene expression profiles is of great research significance and application value for tumor typing and molecular medicine.To deal with the deficiency of the current feature gene selection algorithms existing longer training time and lower classification precision problems,the highly efficient feature gene selection and classification algorithms are designed in case of the characteristics of gene expression profile,selecting a small subset of feature genes and maintain or even improving the classification accuracy.The main innovations of the paper are listed as follows:(1)Aim to the following issues of higher time complexity and blurry description toward to the gene expression profile in the approximation calculation using the global neighborhood,an effective and efficient PNRS model is put forward based on principal component analysis and neighborhood rough set.First of all,low dimensional feature space is obtained by using PCA algorithm;then the multiple neighborhood rough set algorithm is adopt for feature gene selection,namely calculating neighborhood attribute values,following by approximation of neighborhood decision system;finally,feature gene set is obtained by using the heuristic search method.The experimental results showed that,the PNRS model can cancel high redundancy genes and achieve higher classification accuracy.(2)In order to solve the problems of scatter in within-class variance matrix of Fisher discriminant analysis algorithm and poor classification performance in classifying gene expression profiles,a novel feature selection method,called Fisher transformation model,is put forward in this paper to classify tumor microarray datasets.Firstly,the genes with large influence on classification are selected by using the multiple neighborhood rough set algorithm;Then based on the maximum margin criterion,the original data are mapping in a low-dimensional space to insure that the ratio of within-class scatter matrix and between-class scatter matrix is maximum after the sample projection.The experiment proves that the FT method selects a small set of feature genes and obtained higher correct classification accuracy.(3)Supervised locally linear embedding method considered the class label information,however there is still more redundant information between genes and may lead to poor classification performance.Motivated by this,a new feature selection method based on supervised locally linear embedding and spearman's rank correlation coefficient is proposed.Firstly,the supervised locally linear embedding method is used to obtain the primary feature subset;secondly,the spearman's rank correlation coefficient algorithm is used to delete the highly correlated redundancy of genes;finally,feature gene subset is classified by different classifier.Experiments show that this method can overcome the lower accuracy shortcomings of the traditional classification.
Keywords/Search Tags:Gene expression profile, Feature gene selection, Neighborhood rough set, Fisher discriminant analysis, Local linear embedding
PDF Full Text Request
Related items