Font Size: a A A

Towards computational drug discovery

Posted on:2011-02-16Degree:Ph.DType:Dissertation
University:Columbia UniversityCandidate:Yao, LixiaFull Text:PDF
GTID:1448390002956091Subject:Biology
Abstract/Summary:PDF Full Text Request
Biomedicine is becoming a "data-rich" domain thanks to (1) unprecedented advances in experimental techniques, such as automated DNA sequencing, global gene expression measurement, and proteomic techniques and (2) the fast-growing application of information technologies in healthcare systems. Vast volumes of "omics" data are being generated, together with patients' clinical data and scientific publishing data, that hold great promise of allowing a better understanding of disease processes and the discovery of new medicines. To fully realize this opportunity, computational scientists must consider how to integrate the heterogeneous data, develop new models in light of them, and interpret the results in a biomedical context.;In such a context, I worked for my PhD degree toward novel drug discovery using data-driven computational approaches. I identified several quantitative systems- level determinants of drug targets from the statistical analysis of various biological data, including expression profiles across tissues, genetic variations by single nucleotide polymorphism, and protein functional annotations. I then verified the validity of these descriptors by using them to build machine-learning models and predict novel drug targets. Second, with the goal of bringing more clinical relevance to early drug discovery, I worked in the broad field of text mining and biomedical knowledge modeling. I introduced a family of metrics that describe the breadth and depth with which an ontology represents its knowledge domain. I then tested these metrics, using seven of the most popular English thesauri with respect to three corpora that sample written language from the field of medicine, the news, and novels. The quantitative measure of semantic similarity of synonyms can also be applied to analyze the most difficult part of medical language, namely, diseases, symptoms, and adverse drug effects.
Keywords/Search Tags:Drug, Data, Computational, Discovery
PDF Full Text Request
Related items