Research On Relevant Problems Of Tumor DNA Microarray Expression Data Analysis

Posted on:2010-08-28

Degree:Doctor

Type:Dissertation

Country:China

Candidate:G Y Wang

Full Text:PDF

GTID:1114360305473646

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

With the development of Tumor Genomic Project, DNA microarray is widely used in tumor research. Tumor DNA microarray can provide a great number of gene expres-sion data for tumor genomic research, which reflects the fluctuation of gene expression level in different development stage or physiological state of different tissue cells. Be-cause of the capability of uncovering the nature of tumor on the genomic level and pro-viding a kind of new systematic method, the analysis of tumor gene expression data has got great attention. At present, researchers have confirmed some tumor genes and ac-cumulated some knowledge relative to oncogenesis and the regulation mechanism of tumor genes. But these achievements are too little to understand and cure tumor. Thus how to effectively analyze tumor gene expression data has become a problem which must be solved as soon as possible. So taking tumor DNA microarray expression data analysis as the research topic, this dissertation refers to studies on relative preprocessing techniques, cluster analysis algorithms and gene regulation networks modeling methods. The main contents and creative contributions of the dissertation are summarized as fol-lows:(1) The research on methods for missing value estimation and normalization of gene expression data. For the missing value estimation problem, we found that the similarity between gene expression data influences the estimation precision, and the di-mensional distribution of the gene expression data without missing values is a favorable reference to the estimation of missing values. So this dissertation presents a new miss-ing value estimation method based on K-nearest Neighbor and Support Vector Regres-sion (KNN-SVR). This algorithm takes genes without missing values and much similar to genes whose missing values are to be estimated as the training sets, and establishes regressive models through SVR to estimate missing values. This algorithm has better accuracy and stability. In the classification and class discovery of tumor gene expres-sion data, the current normalization methods are likely to make the samples be classified incorrectly. So this dissertation recomposes the normalization methods and uses class information to normalize gene expression data, which makes gene expression data more suitable to the analysis of the classification and class discovery of tumor gene expres-sion data.(2) The research on methods for gene cluster analysis of tumor time series mi-croarray data. In order to identify the asynchronous or local correlation in expression profile, this dissertation presents the concept of Local Maximum Correlative Coefficient (LMCC) and defines the correlative relationship between genes. And then the rules of setting maximum time delay and minimum local time segment are studied. Lastly, this dissertation presents a new clustering method which uses LMCC as the similarity measure of K-means method and makes some corresponding improvements. This method can identify the asynchronous or local correlation preferable and LMCC can provide a more effective measure for similarity.(3) The research on methods for gene cluster analysis of tumor non-time series mi-croarray data. In order to eliminate noise and identify genes with unobviously differen-tial expression in microarray data, this dissertation presents the model of Constrained Independent Component Analysis (CICA) with decreasing noise (deCICA) and uses this model to cluster tumor non-time series microarray data. The clustering method based on deCICA model includes two parts. Firstly, this method extracts a Gaussian white noise to eliminate the noise in gene expression data, in which the statistic of Ljung-Box Q is used as the constraint to the'white'character and gaussianity maximi-zation is used as the object. Secondly, this method uses CICA model to cluster the de-noised gene expression data, in which the expression data of target genes are used as the constraint to the relative biological processes or functional clusters and nongaussianity maximization is used as the object. Because of the capability of eliminating noise partly and retaining the specific information in expression data, this method can identify genes with unobviously differential expression effectively.(4) The research on methods for constructing gene regulatory networks. This dis-sertation first builds the N-order Dynamic Bayesian Network (N-DBN) to model the multi-time delay in gene regulation, and then presents a new method for constructing multi-time delay gene regulatory network using N-DBN by combining expression data with multiple independent sources of prior knowledge (N-DBN-MP). In order to com-bining with time series microarray data, this method transforms multiple independent sources of prior knowledge into different prior probability distributions according to their characteristic, and uses Markov Chain Monte Carlo (MCMC) algorithm to learn the network structure of N-DBN. During the MCMC learning, the acceptance probabil-ity of network structure is decomposed on the basis of the hypothesis that microarray data is independent with prior knowledge, which realizes the fusion of microarray data and prior knowledge. N-DBN-MP can not only effectively identify the regulation rela-tionships between genes, but also reduce the affect of noise in microarray data.

Keywords/Search Tags:

Tumor, DNA Microarray, Missing Value Estimation, Cluster Analysis, Gene Regulatory Network, LMCC, deCICA, N-DBN

PDF Full Text Request

Related items

1	Application Study Of Gene Expression Data On Diagnosis Of Tumor And Prediction Of Gene Function
2	Research On The Construction Method Of Gene Regulatory Network For Tumor Immune Escape Analysis
3	Construction And Analysis Of Active Pulmonary Tuberculosis Immune Response Gene Regulatory Networks
4	Pathway Analysis Of Gene Expression Profiling And Gene Regulatory Network Construction In Stanford Type A Aortic Dissection
5	Regulatory Network Construction And Analysis Of Enterovirus 71 Infection Associated Genes
6	Research Of Gene Regulatory Network Construction Based On Mutual Information And Its Application In Thyroid Cancer Gene Analysis
7	Gene Regulatory Networks And Their Dynamics Model For Tumor Treatment
8	Construction And Analysis Of Gene Regulatory Network For The Development Of Oligodendrocytes
9	Estimation Of Missing CT Projection Data Based On Multiple Deep Networks
10	Identification Of Gene Regulatory Networks In Various Human Tumor Cells