Font Size: a A A

Methods For Classification Of Tumor Gene Expression Data Research

Posted on:2013-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:A X YeFull Text:PDF
GTID:2234330371499906Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Classification of tumor gene expression data is on the basis of DNA microarray technology, which intends to measure genes expression of different samples, finds out the genes with differences expression between different samples and the essential relationship between the different genes and the lesion organizations. Although the classification algorithms have been significant developed in pattern recognition field, it still has many problems remained to be solved. Due to the gene expression data having two characteristics:high dimension and low sample, traditional machine learning method cannot get better classification results with high computational complexity and low efficiency.This thesis proposes three algorithms to classify tumor gene expression data. The main research contents are described as follows:1. This thesis proposes an algorithm to classify tumor gene expression data using histogram theory. Firstly, calculate the entropy of each gene to eliminate redundant genes. Then, select the genes with highest difference and ratio between Peak and Valley as feature genes based on histogram theory. Finally, the classification experiments are performed by Support Vector Machine(SVM) and K Nearest Neighbor(KNN) classifiers.2. Apply nonnegative matrix decomposition and Normal_Matrix spectrum decomposition theory to the classification of gene expression data. Firstly, remove the noise genes using fdr_test scoring criteria to reduce the dimensions of gene expression data preliminarily. Then, extract the comprehensive properties between genes using the nonnegative matrix decomposition, and construct the Normal_Matrix between samples based on the comprehensive properties. Finally, the classification of tumor types is realized by the spectral component gained by singular value decomposition which describes the class attribute of samples.3. This thesis proposes an algorithm to classify tumor gene expression data based on Principal Component Analysis(PCA) and Minimum Spanning Tree theory. Firstly, reduce the dimensions of tumor gene expression data preliminarily using PCA theory. Then, map samples to a high-dimensional space of points, and construct the adjacency matrix. Finally, construct the undirected complete graph of tumor samples using the adjacency matrix. Minimum Spanning Tree is generated and the longest edge of the tree is deleted. Finally, Minimum Spanning Tree is divided into two subtrees. The normal samples correspond to one subtree, and tumor samples points correspond to another one.
Keywords/Search Tags:Classification, Histogram, Non-Negative Matrix Factorization, PCA, Minimum Spanning Tree, Gene Expression Data
PDF Full Text Request
Related items