Font Size: a A A

Computing Analysis On Gene Expression And Its Transcriptional Regulatory Mechanisms

Posted on:2011-09-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:J L ShiFull Text:PDF
GTID:1220330332987037Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Large amounts of gene expression data have been produced with the wide application of high-throughput experimental techniques, which reflect genes’dynamic expression levels at different temporal and spatial conditions. It has been an important way to understand gene function and related mechanisms through identifying intrinsic expression patterns within these data. Computational analysis of gene expression data is one of the most important and hot issue in the research of bioinformatics. Focusing on ’gene expression and its transcriptional regulatory mechanisms’, the dissertation consists of in-depth and systematic studies on missing value imputation of microarray data, class discovery of tumor samples, clustering of gene expression data, prediction of transcription factor activities and analysis of transcriptional regulation, etc. The main contents and contributions of the dissertation are summarized as follows:1) A novel algorithm is designed for missing value imputation of temporal gene expression data. Missing value imputation is an important preprocessing step for analyzing gene expression data. Via studying the temporal specificity of co-regulation/co-expression and extracting the associations of expression level and expression trend, the dissertation presents a novel imputing algorithm. It exploits a temporal window to capture the temporal specificity of co-regulation/co-expression, and combines the associations of expression level and expression trend to score the similarity between gene expression profiles. Via referring to the similarity scores as weights, imputation of missing values is carried out based on the neighbor gene expression profiles. Numerical experiments are designed to compare its performance with several existing algorithms. Results validate the algorithm and show that it can achieve higher accuracy.2) Effectiveness of nonlinear dimensionality reduction is validated for class discovery of tumor samples. It improves the class discovery of tumor samples by identifying intrinsic features in tumor microarray expression data, which can potentially enhance the understanding of tumor occurring and evolutionary mechanisms. However, the basic feature of tumor microarray expression data is ’high dimensions and small samples’, which causes the traditional clustering algorithms for low dimensional data less effective. To solve this problem, the dissertation exploits nonlinear dimensionality reduction to reduce the effect of data noises and various interferences. We apply algorithms to discovery tumor classes under the reduced lower dimensional data space, and pay emphases on the parameter selection of nonlinear dimensionality reduction and the performance comparison of algorithms for tumor class discovery based on linear and nonlinear methods. Experimental results show that nonlinear dimensionality reduction methods can better capture the intrinsic structure in tumor microarray expression data and improves the accuracy of tumor class discovery algorithms.3) A novel fuzzy clustering algorithm is designed via the combination of gene expression and function annotations. Clustering analysis helps to extract functional patterns hidden in the gene expression data. It is an important way for understanding expression data and identifying gene function. The dissertation combines expression and annotation data to assess gene similarity, and designs a novel fuzzy clustering algorithm whose initialization is implemented with gene annotation data. Experimental results show that the algorithm can predict gene function and identify its multiple function categories accurately, thus produces more biologically meaningful clustering.4) A novel algorithm is developed to predict transcription factor activity. Transcription factor is an important regulon during the process of gene transcription. Prediction of its activity helps to understand the transcriptional regulatory mechanisms for gene expression. Based on the model of nonlinear partial least-squares regression, the dissertation presents a novel method to predict transcription factor activity by combining gene expression and ChIP-chip data. Experimental results show the validity of the method. The dissertation exploits the method to predict the activities for the 11 known transcription factors involved in S. cerevisiae cell cycle regulation. Based on the predicted results, the dissertation further studies the periodicity of transcription factor activity and the correlation between them, and finally constructs the transcriptional regulatory network for S. cerevisiae cell cycle regulation.
Keywords/Search Tags:bioinformatics, gene expression, transcriptional regulation, microarray, missing value imputation, dimensional reduction, clustering, transcription factor activity
PDF Full Text Request
Related items