Font Size: a A A

Research On Feature Extraction And Classification In Gene Expression Profiling

Posted on:2015-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:H J KanFull Text:PDF
GTID:2254330428964782Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The rapid developments of DNA microarray technology make it possible to simultaneously observe gene expression at the same level for tens of thousands of genes on a tiny chip. It not only helps for identifying and finding tumor tissues, but also provides reliable scientific evidence for the research of tumor molecular biology. With the development of technology and experimental apparatus, the amount of gene expression profile data becomes much larger. Studying the mechanism of tumors’ occurrence and development at the molecular level is of benefit to locating genes relevant to tumor. Meanwhile, revealing the regulation of relationships between tumor genes is also another promising direction. All of these will lead to a new approach to tumor diagnosis and treatment.In this paper, feature extraction and classification methods are studied on the cancer gene expression profile data. Meanwhile, we thoroughly analyze the experimental results.The main contents are as follows:1. Considering that outliers have significant influence on the mean and variance information of gene expression level of the samples, this paper presents a novel method that integrates neighborhood uncertainty and scoring criteria to identify genes associated with tumor type. First, for each sample, the neighborhood uncertainty is used to produce reliable expression levels of the sample in all the samples; Then, the expression values and its neighboring points as the analysis object and informative genes are selected by using scoring criteria based on the obtained reliable expression levels; Finally, the classification experiments are conducted via the technique of KNN. The experiments are conducted on two real data produced by DNA microarray technology, and the results show that the informative genes selected by the proposed method have higher reliability than those selected by scoring criteria alone.2. Random walk algorithm is applied to the classification of tumor gene expression profile data. First, mapping the samples of gene expression profile data into points in high dimensional space, we construct a graph for point set, and then obtain its corresponding adjacency matrix and Laplace matrix; Then, we use the Combinatorial Dirichlet to solve the transition probability of random walk; Finally, the classification of gene expression profile data is achieved according to the maximum value of transition probability and the experiment results shows the effectiveness of the proposed method.
Keywords/Search Tags:Tumor, Gene expression profiling, Neighborhood uncertainty, scoringcriteria, Random walks, Combinatorial Dirichlet problem
PDF Full Text Request
Related items