Font Size: a A A

Data Mining For Mammalian Transcription Factors And Downstream Targets

Posted on:2010-03-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:G Y ZhengFull Text:PDF
GTID:1100360278454401Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Transcription factor (TF) is a core functional protein of transcriptional regulation, and it controls expression level of downstream genes (TF targets) through interacting with cis-regulatory element (CRE), which plays significant roles in some vital biological processes of an organism. Investigation of TFs and their targets becomes a hot research area in post genome era because of their important function to transcription.Traditionally, experimental approaches are used to investigate TFs and their targets by biologic scientists. People can obtain accurate information about transcriptional regulation through experimental approaches, but these approaches are time-comsuing and they can not provide abundant information in a short time. Hence biologic scientists begin to explore transcriptional regulation through computational methods recently, which most of works are focus on TF identification and CRE modeling. For TF identification, machine learning algorithm was generally used to build analysis tools. Currently, identifying methods based on BLAST and nearest neighbour algorithm (NNA) are built, however performance of these methods are not satisfied when applied in mammalian. For CRE modeling, biologic scientists try to describe preference between TF and CRE through constructing models with various features. Nevertheless, process of CRE modeling is still on going because of complicate interaction mechanism between TF and CRE.In our work, the support vector machine (SVM) algorithm was utilized to construct an automatic detector for TF identification, where protein domains and functional sites were employed as feature vectors. Then a TF classifier was built by combining the error-correcting output coding (ECOC) algorithm with SVM methodology. Datasets valided by biological experiments were used to test performance of the detector and classifier. Test results demonstrated that the two tools had excellent capability for TF analysis, and overall success rate of identification and classification for TF achieved 88.22% and 97.83%. In order to evaluate performance of these tools further, we compared our tools with tools built from BLAST and NNA respectively. Comparison results showed that our tools were superior to tools of BLAST and NNA for TF analysis. After that, the detector and classifier were utilized to analyse protein sequences of Human, Mouse, and Rat. As a result, plentiful putative TFs were obtained.Subsequently, a mining tool for TF-target pairs was developed based on reverse engineering theory so as to get regulated genes of TFs. After that, the mining tool was used to analyse microarray data of Human, Mouse, and Rat. As a result, lots of TF-target pairs were gained. The fisher's exact test was carried out to assess reliability of TF-target pairs in work. Results of fisher test indicated that approach used here to predict TF-target pairs were valid, and information of downstream genes for TFs inferred here was believable to some extent.In order to further explore regulatory function between TFs and their targets, we investigated interaction mechanism between TF and CRE. In work, a combinational model of CRE was constructed based on decision tree through assembling serverl biologic features. After that, in Human, Mouse, and Rat, many interaction pairs between TF and CRE were employed to estimate performance of the combination model. Results of estimation made clear that the model did have good power to depict binding preference and interaction mechanism between TF and CRE.Finally, an integrated platform of TF was built so that biological scientists can conveniently use information of TFs and their targets acquired in our work. In brief, abundant data of transcriptional regulation was contained in the platform, which also provides a prediction tool for TF. We believed that the platform will serve as an import resource for community of transcription researchers, and present strong support for exploration of transcriptional regulation.Currently, the data of transcriptional regulation in mammalian is far from sufficient. In order to solve the problem, we mined and presented a great deal of information about TFs and their targets in Human, Mouse, and Rat. Moreover, we investigated binding characteristic between TF and CRE, which will increase people's knowledge of transcriptional regulation machenism. In summary, we think the work of comprehensive research for TF will help people interpret genome information in systems level.
Keywords/Search Tags:mammalian, transcription factor, downstream target genes, data mining
PDF Full Text Request
Related items