Font Size: a A A

Application Of Web Services And XML In Bioinformatics Data Distribution And Integration

Posted on:2005-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2120360152955371Subject:Genetics
Abstract/Summary:PDF Full Text Request
In 90s of last century, the development of human genomics project indicated that biology has stepped into the age of genome. The most remarkable characteristic of the age is the volume of biology data growing at an exponential rate. More and more genomes are being sequenced and annotated, and the data of proteins and genes are accumulated. With the rapid development of WWW (World Wide Web), biological data are mostly digital and stored in a wide variety of formats in heterogeneous systems. Biological data exist all over the world as various web sites, which provide biologists with much useful information. However, the complexity of biological data and the variety of data formats make it difficult to retrieve and integrate the interesting data. Comparing with the traditional structured data, the biology ones locating at Web are semi-structured or no-structured, and have heterogeneous formats. Therefore, retrieving and integrating biology data is a very important task. Recently, it is widely recognized that exchange, distribution, and integration of biology data are the keys to improve bioinformatics and genomics in post-genomic era. The extensible Markup Language (XML) is rapidly spreading as an emerging standard for structuring document for exchanging and integrating data on the World Wide Web (WWW). Web service is the next generation of WWW and founded upon the open standards of W3C (World Wide Web Consortium) and IETF (Internet Engineering Task Force). This paper presents XML and Web Servicestechnologies and their use for an appropriate solution to the bioinformatics data exchange and integration problem.A number of differentially-expressed cDNA fragments were obtained from Phanerochaete chrysosporium by using Suppression Subtractive Hybridization (SSH) and Microarray techniques and 433 of them were sequenced. To manage and analyze these EST data, based on Linux operating system, the Phrap, EMBOSS, Blast, GENSCAN, MZEF software were used to construct a platform. The platform includes constructing EST and genome databases, removing vector sequences, sorting and assembling sequences, locating on genome, identifying exons and introns, and predicting genes. Moreover, using bioperl modules, the scripts written with perl language enable analysis automatically. Results demonstrated that the robust platform could accelerate data analysis for large-scale EST sequences and offer useful information for cloning correlative genes and studying the functional genomics of Phanerochaete chrysosporium.
Keywords/Search Tags:biological data integration, biological data distribution, extensible Markup Language (XML), web services, bioinformatics, Phanerochaete chrysosporium, EST
PDF Full Text Request
Related items