Font Size: a A A

The Research On The Integration Of Protein Interaction Data And Network Modules

Posted on:2016-05-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:H L TangFull Text:PDF
GTID:1220330509961055Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In molecular level, researchers have realized that the biology function are mainly realized by thousands of protein interactions. As a result, lots of protein interaction data are produced. The management and mining of these data are changlleging. Under the guidance of systems biology, four issues about management and mining of protein interaction data are explored in this paper.(1)The integration of protein interaction data and development of data platform. Currently, protein interactions are dispersed in diverse heterogenous databases. There exists huge complementary in interacton type and coverage among these databases, so that it is very necessary to integrate these databases. An approach is developed to integrate the biology molecular pathway and protein to protein interaction(PPI). First, an unified data model, which depicts the protein interaction in binary model, is developed to classify the interactions into biological PPI(Bio PPI) and techlogical PPI(Tech PPI) and have the ability to reserve more information by adding biological and functional effect items. Then, a series of transformation rules are developed to transform the protein interaction in pathway model to our new model. Finally, 7 human pathway databases(PID, Bio Carta, Reactome, Net Path, INOH, KEGG and SPIKE) and 5 PPI databases(HPRD, Int Act, Bio GRID, MINTand DIP) are integrated into Path PPI, which contains 23,041 Bio PPIs and 72,473 Tech PPIs among 13,411 human proteins.(2)Protein property data globally-collecting and mining. Recently, the data about genes and their products(RNAs, proteins) annotations has increased rapidly, such as molecular sequence, advanced structure, physicalchemical property, sequence modification site, chromosome location, subcellular location, molecular function, biological process, evolution rate, phenotype, tissue expression, pathway, stability and so on, which depict the characteristics of genes and their products in multiple aspects. It is the basic work to carry out the collection and mining of these data. First, we globally survey the available protein property data, that come from public databases, literature or self-computation. We then explore the classification of properties and standardization of storage format. Four principles are proposed to achieve the classification and two formats are proposed to finish the data storage. Second, we explore the value distribution characteristics of each protein property, which is helpful to understand the protein properties globally, such as the single model distribution of molecular weight, multiple model distribution of p I and hydrohobicity, and so on. Third, we explore the physical characristics of proteins in different biological function categories. The biological categories with extreme physical property value are mainly discussed, such as the categories with latest origin time, with highest p I value, and so on. This analysis is helpful for the understanding of the relationships among physical and biological function properties. Fourth, we explore the physicalchemical, original time and biological function characteristics of human genes and proteins in different chromosomes location. We focus on the chromosomes or their arms, bands with enriched original time and biological function groups, such as the CHR1-2, CHR4, CHR6 and CHR9 with more enriched pathways, CHRX with more enriched disease clasess.(3)The study of protein interaction network modules. It is an effective method to explore the network structure and function by separating the network into sub networks(modules). Different from the traditional ways that are based on the network topology, we define the network modules from the biological perspective. Two new types of modules are studied in this paper. First, we explore the regulation module of metabolic pathways. In tranditional biological chemistry and molecular biology, it is focused on the molecular composition, structure, function, biology synthesize, regulation and so on, and seldom pay attention to the co-regulation of multiple enzymes from the same pathway. However, there must exist close relationships among the regulation and metabolic pathways for the bio-system. Thus, we construct the relationships among the regulation proteins and metabolic pathways. Second, we define a new type of module, that is named equal-expression module, of which the proteins have a similar expression value. In our previous work, we found that there exist a large number of subnetworks, in which the proteins have same expression values, in regulation and metabolic pathways. Do the proteins in these subnetworks have more close relationship in biology function? Are these subnetworks basic components of protein network? As a result, we identified the equal-expression modules from regulation and metabolic networks for five sets of data and we verify that the proteins in equal-expression modules have higher co-expression coherency, GO similarity and co-regulation coherency.(4)Mining of HCC metastasis-related network modules. Metastasis is one of the key factor of HCC lethality, but its molecular mechanism is far from clear. Recently, the diverse omics-technologies have brought new ways to the exploration of HCC metastasis genes identification, molecular diagnosis and other problems in systematical level. This paper focuses on the identification of the HCC metastasis-related modules by analyzing expression data based on diffenretial modules method. First, we improve the differential expression modue identifying method by integrating the GO information. Results show that improved method has higher module accuracy and precision in disease genes identification. Then, improved method is applied into two sets of gene expression data to identify HCC metastasis-related modules. Eventually, we obtain 6 modules associated with cell migration.
Keywords/Search Tags:Protein interaction, Pathway, Systems biology, Data integration, Attribution analysis, Metabolic pathway regulation, Equal-expression network module, Differentiate network module, Liver cancer metastasis, Molecular biomarker identification
PDF Full Text Request
Related items