Methods For Classification Of Tumor Gene Expression Data Research

Posted on:2013-02-23

Degree:Master

Type:Thesis

Country:China

Candidate:A X Ye

Full Text:PDF

GTID:2234330371499906

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Classification of tumor gene expression data is on the basis of DNA microarray technology, which intends to measure genes expression of different samples, finds out the genes with differences expression between different samples and the essential relationship between the different genes and the lesion organizations. Although the classification algorithms have been significant developed in pattern recognition field, it still has many problems remained to be solved. Due to the gene expression data having two characteristics:high dimension and low sample, traditional machine learning method cannot get better classification results with high computational complexity and low efficiency.This thesis proposes three algorithms to classify tumor gene expression data. The main research contents are described as follows:1. This thesis proposes an algorithm to classify tumor gene expression data using histogram theory. Firstly, calculate the entropy of each gene to eliminate redundant genes. Then, select the genes with highest difference and ratio between Peak and Valley as feature genes based on histogram theory. Finally, the classification experiments are performed by Support Vector Machine(SVM) and K Nearest Neighbor(KNN) classifiers.2. Apply nonnegative matrix decomposition and Normal_Matrix spectrum decomposition theory to the classification of gene expression data. Firstly, remove the noise genes using fdr_test scoring criteria to reduce the dimensions of gene expression data preliminarily. Then, extract the comprehensive properties between genes using the nonnegative matrix decomposition, and construct the Normal_Matrix between samples based on the comprehensive properties. Finally, the classification of tumor types is realized by the spectral component gained by singular value decomposition which describes the class attribute of samples.3. This thesis proposes an algorithm to classify tumor gene expression data based on Principal Component Analysis(PCA) and Minimum Spanning Tree theory. Firstly, reduce the dimensions of tumor gene expression data preliminarily using PCA theory. Then, map samples to a high-dimensional space of points, and construct the adjacency matrix. Finally, construct the undirected complete graph of tumor samples using the adjacency matrix. Minimum Spanning Tree is generated and the longest edge of the tree is deleted. Finally, Minimum Spanning Tree is divided into two subtrees. The normal samples correspond to one subtree, and tumor samples points correspond to another one.

Keywords/Search Tags:

Classification, Histogram, Non-Negative Matrix Factorization, PCA, Minimum Spanning Tree, Gene Expression Data

PDF Full Text Request

Related items

1	Feature Extraction Of Cancer Gene Expression Data Based On Non-negative Matrix Factorization
2	Tumor Dna Microarray Data Classification Based On Non-negative Matrix Factorization
3	Non-negative Matrix Factorization Algorithm To Deal With The Cancer Gene Expression Data
4	Brain Network Construction Analysis And Classification Based On Minimum Spanning Tree And Group Lasso
5	Brain Network Classification Method With Multiple Features Fusion Based On Minimum Spanning Tree Analysis
6	Non-negative Matrix Factorization Based Clustering Research For Cancer Gene Expression Data
7	Lung Data Processing Based On Non-negative Matrix Factorization
8	Fusion Methods Of MRI & MRSI Based On Matrix Factorization
9	Application Of Minimum Spanning Tree In Schizophrenia EEG Brain Network
10	Research On Prediction Methods Of Disease-related MiRNAs Based On Non-negative Matrix Factorization