Font Size: a A A

Literature Mining And Analysis Of The Signaling Pathway

Posted on:2008-01-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:L R WangFull Text:PDF
GTID:1114360212998577Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Bioinformatics is an interdisciplinary science that deals with biological data by means of storing, searching and performing analysis with the power of modern computers. Now with the explosive growth of biomedical literatures, there have been a lot of interests within the scientific community in how to capture information from the vast current scientific literatures in a form suitable for analysis by computer. A central problem in bioinformatics is to design literature-mining tools to find the nuggets of information from literatures. Comparing to traditional key-words retrieval method, advantages of these tools are apparent: fast, automatic and efficient in time and labor resources, especially in large-scale article analysis.Signal pathway is the substance of cells' response to environment. It plays crucial regulatory roles in a variety of biological cellular processes, including metabolism, cell cycle, differentiation, proliferation and apoptosis, etc. Signal pathway has been one of the most concerns of molecular biology in recent years. But lots of valuable information are dispersed in volumes of literature. It's quite time to collect this information related with signal pathway to understand the comprehensive signal pathway.In this dissertation, some original research work by the author can be formulatedas follow:1. Gene expression is one the result of signal pathway, and transcription factors play pivot roles in this progress. A literature mining method based on Bayesian is proposed to retrieve articles describing binding sites of transcription factor. By first statistically identifying words that discriminate relevant abstracts from other abstracts, each new abstract can then be assigned a log likelihood score for discussing binding site of transcription factors. We just attest that this method is similar with classical information retrieval method which is based on TF/IDF theory. The efficiency of this method is improved greatly by combining with related articles method of PubMed. The recall rate and precision rate of our method are 91% and 45%, which outperforms traditional key-word method (recall rate <=83% and precision rate<=26% respective). Although our method has a little lower recall rate than sole related articles of PubMed (recall rate 93% and precision rate 27%) with~2%, the precision rate is high with~18% enhanced. We found about 63,000 interested articles with this method.2. Protein kinases (PKs) play important role in transforming information of signal pathway. They phosphorylate the substrates (proteins) at the specific sites (phosphorylation sites) flanking with canonical motif. Once again, we mined literature describing phosphorylate sites by using use the Bayesian method. We also built an assistant tool to add color tag in sentences for rapid process. With the search result data and data in Phospho.ELM, we proposed a method based on Bayesian decision theory—PPSP to predict the potential phosphorylation sites of PKs. Prediction results on~70 PK groups show that in general, it outperforms state of the art methods: Scansite, KinasePhos, NetPhosK and GPS, which suggests that this method is another competitive computational approach in this branch of bioinformatics. At the same time, this method has the advantage of simpleness, efficiency and robustness. A web service is also available for online perdition at (http://bioinformatics.lcd-ustc.org/ PPSP).3. A novel method called "Transcription Factor-Mediated Pathway Analysis" is presented to infer abnormal transcription factors and pathways in cancer chips. The activity of a transcription factor is inferred by evaluating the net result of percents of activated (or repressed) target genes in a chip, and then the abnormal transcription factor is mapped to pathways deposited in KEGG. This algorithm integrates the experiments of gene-regulation and pathway. We have analyzed human gastric cancer, breast cancer and 11 different types of cancer which stored in Stanford Microarray Database (SMD) in this method and found that TGF-B, JAK-STAT, NF-?B and Notch pathway are over-activated in many of these chips. These abnormal pathways will be of great help in understanding the progress of cancer and in rational drug design.
Keywords/Search Tags:Literature
PDF Full Text Request
Related items