| With massive amounts of clinical data and phenotype data of EMR systems becoming available, many researches have revealed disease clinical comorbidity association by taking advantage of clinical patient record. The resource of comorbidity open new opportunities for disease etiology, pathogenesis, prevention and treatment. However, few studies involved in population-based disease associations was efficiently used in conjunction with molecular and genetic data to explore their potential molecular mechanisms behind the association and uncover the molecular origins of diseases. To fill the gap, we first obtained a lot of clinical comorbidity relations between common diseases (including cancers) from three main sources which calculated the strength of comorbidities based on electronic medical records in hospitals. For lack of rare diseases, we further developed a novel text mining method for discovering disease comorbidity (particularly for rare diseases) based on abstracts from PubMed. Also, we integrated multi-level resources, including clinical data, OMIM record, GWAS data, GDA data, drug-target information, protein-protein interaction data, biological pathway, gene ontology and other data from large-scale text mining. We standardized the data mining processes and manually inspected data with high quality.Then, we verified the hypothesis "Comorbidity diseases share potential common molecular mechanism" by shared susceptibility genes, shared biological pathways, co-expression genes and protein encoded by gene and protein interaction. Based on the hypothesis, we developed a simulated annealing algorithm to predict candidate disease genes in large-scale disease comorbidity networks by global Guilt by Association optimization, then combined protein-protein interaction, biological pathways and gene expression profiling to further prioritize these candidate genes.On the other hand, for each comorbid diseases pairs, considering that different biological processes for these two diseases may share the same susceptibility genes, we conducted pathway enrichment analysis with those susceptibility genes related to two diseases, and identified the pathways common to these two diseases and those genes participating into those pathways. Through the pathway analysis, we tried to link the pathogenetic association between the two diseases at the molecular level. Also, we discovered new candidate risk factors for both diseases, which exerted their pleiotropic effects in development of diseases based on protein-protein interaction analysis. We took example of rare disease (Mitral valve prolapse) and common disease (Polycystic ovary syndrome) to highlight reliability of our method.What’more, we created hDIG (human disease Interaction & Gene Network), a web server for analyzing the molecular mechanism behind disease comorbidity and visualizing gene interaction network with high-performance. To facilitate the development of drug repurposing (particularly for orphan diseases), we also added drug information in our reconstructed protein-protein interaction network. Finally, our text mining results indicate that such a combination of population-level data and cellular network information could help build novel hypotheses about disease mechanisms. Connecting diseases with similar pathological mechanisms could inspire novel strategies on the effective repositioning of existing drugs and therapies. |