Font Size: a A A

Studies Of Grid-based Bioinformatics Computational Pipeline System

Posted on:2006-02-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:L A QiaoFull Text:PDF
GTID:1100360155974104Subject:Biology
Abstract/Summary:PDF Full Text Request
The integration of trans-platform, distributed and heterogeneous resources is one of the major concerns of current bioinformatics community. This dissertation proposed and established a bioinformatics Grid application system, BOD (Bioinformatics On-Demand). The system uses Grid computing technology to set up a virtual workbench via web platform, to assist researchers performing customized comprehensive bioinformatics computational jobs. A customer is able to customize and submit entire bioinformatics search queries and computation requests, e.g. from DNA sequence assembly to gene prediction and finally protein folding prediction, at his own office using the BOD end-user web interface. The BOD web portal parses customer's job requests into sequential steps, each of which may contain multiple tasks in parallel; these tasks could be implemented concurrently and independently. The BOD task scheduler takes an entire task, or splits it into multiple subtasks proportionally, depending on the nature of the tasks and the status of available node resources, and dispatches the task or subtasks proportionally to computation node(s) associated with the BOD portal server, after checking the resources of the computation nodes. A computation node may further split and distribute an assigned task to its sub-nodes with similar strategy. Each node communicates with its parent node and child nodes within the Grid technology framework. After each node carry out its assigned subtask computation independently, the BOD portal server receives and packs up each subtask's result, performs remaining steps similarly; at last, it returns all the results specifically to the customer. This dissertation uses a computational pipeline model to describe user's submitted jobs and stores the job requests/status/results in a relational database. A universal XML criterion is established to capture the details of the task, which is associated with a certain computation program. In addition, the incorporation of the workflow management system enables the development, management, schedule, and monitor of the computational pipeline. The kernel technology established during the development of BOD system could be expanded and applied to other scientific research field. At present, BOD supports complex computational pipeline with multiple steps and multiple parallel tasks. Customer can submit single or multiple input files, customize and intervene the pipeline, and check multiple output results in specific way at one time. BOD system expands the computational capacity of current bioinformatics softwares. The BOD system is freely available at http://e-science.tsinghua.edu.cn/bod/.
Keywords/Search Tags:bioinformatics, Grid, computational pipeline, workflow, task description
PDF Full Text Request
Related items