Font Size: a A A

P2p-based Biological Information Retrieval

Posted on:2009-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:Q L JiangFull Text:PDF
GTID:2190360272459839Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As biological technology develops rapidly recent years, especially as the human genomic program goes further, large number of biological documents appear, while more and more biological sequences (including gene sequences and protein sequences) are detected, and a large amount of gene expression data is calculated by microarray technology. Then how to make effective use of these resources becomes a critical issue. Simultaneously, peer-to-peer (P2P), which has many advantages and can be applied into lots of systems, has attracted more and more attentions in both academe and industry.First, after comprehensive research on text sharing and retrieval technology in P2P environment, this paper proposes a biological text sharing and retrieval system in which every autonomous peer has powerful indexing and searching ability. By means of appropriate peer join policy, effective indexing and routing mechanism, the system not only efficiently supports various kinds of queries, such as metadata-based queries, content-based queries and the boolean operations between them, but also has significant scalability. The system also provides friendly GUI for merging and displaying the query results. Besides, our system has been deployed in many universities and institutions, which proofs its efficiency and the ability of data exchange across different platforms.Second, to solve the problems of data updating, sequence privacy and interaction between different data sources in web-based BLAST or local BLAST, we introduce P2P technology into sequence alignment for the first time. P2P-BLAST is a peer-to-peer based system, which implements an improved BLAST algorithm and allows searching DNA and protein sequences, instead of the well-known web-based BLAST which has a central genomic database. In this completely new kind of sequence searching system, queries for given genomic sequences, submitted by the peers in the network, will be forwarded to the peers which have local genomic databases and executed locally. P2P-BLAST also provides the visualization of the integration and ranking of the results which are received from different peers, and provides the similarity between the given sequence and the local database.Besides, this paper presents a novel algorithm based on gene ontology for the similarity between gene expression data, a weighted edit distance algorithm based on gene ontology, which will be used in gene search based on expression similarity. This algorithm maps two gene expression data calculated by microarray onto different structures on gene ontology, and uses pre-defined edit operations and weight functions to compute the distance between the two structures, and then obtains the similarity between the two gene expression data by this edit distance.
Keywords/Search Tags:P2P, information retrieval, sequence alignment, BLAST, gene ontology
PDF Full Text Request
Related items