Font Size: a A A

Reserch On The Classification For Tumor Genomics Data

Posted on:2021-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y WuFull Text:PDF
GTID:2404330611453490Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Tumor is one of the major diseases that seriously threaten human health.Its occurrence and development is a multi-stage process in which multiple genes change gradually.Early diagnosis can prevent its further deterioration and improve the survival rate of patients.Gene chip technology can detect the expression levels of a large number of genes,the correct classification of tumor gene expression data is helpful to early diagnosis and treatment of tumors.Tumor gene expression data usually expresses high-dimensional,small samples and class-imbalanced characteristics.It is very important for improving the classification accuracy of tumor to extract features and build a classification model considering the class-imbalanced.The main work of this paper includes:(1)Aiming at the characteristics of high-dimensional and small samples,different manifold learning algorithms are employed to extract the local and global features of high-dimensional data,and obtain the potential low-dimensional manifold,so as to eliminate the redundancy and reduce the dimension of the data.Then a Gaussian process classifier is constructed to classify the low dimensional features.The experimental results show that the isometric feature mapping algorithm and the supervised locally linear embedding algorithm can preserve the data structure features more completely.By combining with manifold learning,Gaussian process classifier can improve the classification accuracy of tumor gene expression data effectively.(2)Aiming at the characteristics of class-imbalanced,the importances of different classes are balanced by assigning different weights to the likelihood function,so as to increase the classification decision-making power of minority class.The experiment results show that the proposed method can retain the original distribution of the data,solve the problem causing by the imbalanced data to some extent.This algorithm is better than the traditional algorithm in overall classification performance,and superior to the oversampling technology in algorithm timeliness.Finally,the SRBCT,ALL-AML-3 and Brain tumor gene expression datasets are used to verify the efficiency of the multiple classification method based on weighted Gaussian process classifier.The overall classification accuracy and the lowest classification accuracy of single class are used for evaluating.The experimental results show the proposed method performs better than other multi-classification methods,and could effectively solve the imbalanced problem of tumor data.
Keywords/Search Tags:Gene expression data, manifold learning, imbalanced data, multi-classification, Gaussian process
PDF Full Text Request
Related items