| One of the key goals of postgenomic biomedical research is to systematically study and confirm the interactions of all molecules within a living cell. A crucial step towards understanding the cellular systems properties is mapping networks of DNA-, RNA-and protein-protein interactions, the'interactome network', of an organism as completely and accurately as possible. Increasing interactome data are generated from experiments applying high-throughput technologies, computational prediction methods, and literature mining. The researchers have constructed a series of databases to store and manage various types of interactome data. The existing interactome databases isolate from each other, so that it is difficult to realize efficient sharing and utilization of interactome data. Therefore it is necessary to integrate these independent and isolated databases to realize better management and more efficient utilization of existing interactome data. Data integration is fundamental to increase the overall knowledge and understanding of the field of interactome. Data integration has become one of the most crucial directions of interactome research.In this study, an interactome data warehouse, InteractomeDW, was created. InteractomeDW was composed of 4 parts:interactome database collection, bio-entity mapping database, biological ontology and controlled vocabulary database collection, and biological annotation database. InteractomeDW contained 62 779 056 interaction records of 5 types, including complex, domain-domain interaction (DDI), molecular interaction (MI), pathway, and protein-protein interaction (PPI). InteractomeDW involved 2 426 organisms,170 interaction identification methods,44 interaction types, and 85 212 literatures. To our knowledge, InteractomeDW was obviously larger than any previous data warehouse related to interactome.In this study, a new heterogeneous data integration method, WM, was proposed. WM adopted data warehouse to manage the data so as to ensure the availability of data sources and improve the query efficiency and data quality. Because all the data were physically stored in local server with backup and recovery support, the availability of data sources and query efficiency could be ensured to the utmost extent. The data cleansing component of data warehouse could detect, revise and delete the data which were damaged, incomplete or "dirty", so that the data quality of data integration was ensured. WM utilized mediation-based solution to implement specific data integration in order to improve system scalability and maintainability. WM could extend itself by registering new data sources to the data integration mediator and building corresponding wrapper, which had almost no effect on other parts of the system. Such manner was low-coupling and flexible; therefore the maintainability of the system itself and corresponding softwares were greatly enhanced. In summary, WM combined the advantages of the data warehouse based and mediation-based methods to well balance the efficiency and flexibility of the data integration, and provided the infrastructure and solution for the heterogeneous interactome data integration.In this study, a web based interactome heterogeneous data integration system, IMbase, was created by using the WM method. IMbase was a computational platform to share and utilize a wide range of interactome data. IMbase provided several services, including interactome data integration service, interaction network analysis and inference service, and bio-entity mapping service. IMbase could help the researchers to generate hypotheses via exploring potential or unknown interactions and realize knowledge discovery. IMbase was a computational platform for vertical integration of interactome data. IMbase aimed at summarizing and collating existing data of interactome research, realizing comprehensive data sharing in this field, and increasing the overall knowledge and understanding of interactome. Furthermore, horizontal integration and knowledge discovery of interdisciplinary data could be realized on the basis of vertical integration of interactome data. To our knowledge, IMbase was obviously larger than any previous data integration system related to interactome. IMbase was freely accessible for non-commercial users from http://122.70.220.98/imbase/index.gr.In this study, IMbase was applied to the study of mouse neural tube defects (NTDs). We used the differentially expressed genes identified by the gene expression profile microarray as baits to obtain the genes which interacted with those differentially expressed genes, and then the related interaction network was constructed. We constructed a database namely MouseNTDs, which stored all the known candidate genes of mouse NTDs. With the help of MouseNTDs, we identified the potential candidate genes of mouse NTDs from the interaction network mentioned before. Finally, we proposed a hypothesis of candidate genes of mouse NTDs via studying the biological annotations and pathway information related to the potential candidate genes of mouse NTDs identified by IMbase.The major innovations in this paper are summarized below:1. A new heterogeneous data integration method, WM, was proposed. WM combined the advantages of the data warehouse based and mediation-based methods to well balance the efficiency and flexibility of the data integration, and provided the infrastructure and solution for the heterogeneous interactome data integration.2. An interactome data warehouse, InteractomeDW, was created. InteractomeDW was composed of 51 interactome data sources and 9 secondary data sources. InteractomeDW contained 62 779 056 interaction records of 5 types, including complex, domain-domain interaction, molecular interaction, pathway, and protein-protein interaction. InteractomeDW involved 2 426 organisms,170 interaction identification methods,44 interaction types, and 85 212 literatures.3. A bio-entity mapping database, BEM, was created. BEM integrated 5 related data sources, and contained 180 328 282 non-redundant mapping records of 4 types, including genes, proteins, small molecules or chemicals. BEM could implement the mapping of bio-entities between 90 common biomedical databases.4. A web based interactome heterogeneous data integration system, IMbase, was created by using the WM method. IMbase was a computational platform to share and utilize a wide range of interactome data. IMbase provided several services, including interactome data integration service, interaction network analysis and inference service, and bio-entity mapping service. IMbase could help the researchers to generate hypotheses via exploring potential or unknown interactions and realize knowledge discovery.5. IMbase provided not only a web application, but also a series of web services with a range of features to enable other softwares to programmatically search and retrieve the integrated interactome data. These web services realized software reuse and interoperability.6. We applied IMbase to the study of mouse neural tube defects (NTDs). By constructing and analyzing the interaction network of the potential mouse NTDs candidate genes, we proposed a hypothesis of mouse NTDs candidate genes. |