Font Size: a A A

Study On Pathogenic Gene Detection Based On Complex Network

Posted on:2015-06-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:S Y WuFull Text:PDF
GTID:1104330467973453Subject:System theory
Abstract/Summary:PDF Full Text Request
With the rapid development of high throughput technologies and widespread use of bioinformatics techniques, more and more interaction Omics data have begun to appear. It provides opportunities to understand complex function of organisms from the perspective of system analysis by constructing biological networks. In recent years, it has interpreted some biological phenomena and unveiled a few of disease mechanisms by biological network analysis, which has great importance to disease gene prediction, molecular diagnosis, protein complexes mining, etc.As an important task of bioinfonnatics, disease gene prediction aims to utilize known gene-disease associations and Omics data to discover potential disease genes based on theories of complex network, machine learning, etc., so as to make it easier for biomedical experts to conduct experimental validations. Generally, disease gene prediction detects candidate genes that have the most similar functions with known disease genes, and the topology associations between macromolecules in biological networks can reflect their function similarities. Thus, the classical hypothesis "the neighbors of disease genes are likely to cause similar or the same diseases" are highly praised, which has been widely applied in the studies of disease gene prediction based on biological networks. However, most of the existing disease gene prediction methods face a significant bottleneck. The bottleneck is that they can only search candidate genes in small localized regions near to known disease genes, and probably mistake hub proteins with high betweenness for potential disease genes. It is expected to break through the bottleneck by discovering new mechanisms or effectively integrating of various kinds of Omics data.Therefore, this paper considered the difference of varieties of macromolecules in biological networks, and proposes the concept of multi-factor network motif. By analyzing multi-factor network motif in the protein interaction network, this paper attempted to discovery new mechanisms to find out associations among different classes of macromolecules. This paper intended to design a new strategy for disease gene prediction based on new mechanisms, so as to extend the scope of detection and increase the accuracy rate. New findings of this paper are helpful to deeply understand the protein interaction network, and can provide new visions for disease genes prediction. The main achievements of the thesis are as follows.(1) Protein interaction network modeling and multi-factor network motif. Multi-subnet Composited Network was adopted to describe the protein interaction network, which laid the foundation for the following sections. The concept of "multi-factor network motif" was proposed to study the topology associations between different varieties of macromolecules.(2) Analysis of topology properties of the protein interaction network model. The studies of this thesis began with the discussion of a hot issue,"Are disease proteins topologically important?" in the protein interaction network study. In view of shortcomings of previous studies, this paper selected housekeeping genes as essential genes and took essential proteins as references. A number of metrics, such as, degree, k shell decomposition, average shortest path,1NX and2NX were utilized to analyze the differences between disease proteins and other proteins. Empirical results demonstrate that, compared to other proteins, disease proteins are topologically more important, closer to the network center. Then, this paper adopted the gene classification strategy to detect disease gene by integrating new topological features, which effectively improve the performance.(3) Analysis of associations between disease genes and essential genes. Some studies claimed that, if disease proteins were topologically important, it might inflict huge damage to the living system when disease genes mutated. In order to explore the rationality of "Disease proteins are topologically more important", this paper defined one kind of multi-factor network motif rxn. A comparative analysis of the proportions of essential proteins in neighbors of disease proteins and other proteins was conducted. Empirical results demonstrated that, compared to other proteins, the proportions of essential proteins in the neighbors of disease proteins are statistically smaller. It shows that disease proteins are not well connected with essential proteins. Gene expression microarray data was utilized to analyze the functional associations between disease genes and essential genes. Experimental results demonstrated that, compared to other genes, diseases genes are less correlated with essential genes. In addition, this paper found that, a number of proteins far from disease proteins are also possible to cause diseases. Then, this paper adopted the gene classification strategy to detect disease genes by integrating new topological features, which obviously improve the recall of disease gene prediction.(4) Disease gene prediction based on the protein interaction network model. On the basis of the new finding that "disease genes are not well connected with essential genes", this paper designed a new global distance measurement. The goal of the new measurement is to find candidate disease genes, which have more interactions with known disease genes, but fewer interactions with essential genes. Given one disease and the corresponding know disease genes, this paper allocated disease proteins with positive resource and essential proteins with negative resource, and adopted the network propagation method to detect potential disease genes. Experimental results on110diseases prove the effectiveness of the proposed method, especially on monogenic diseases and complex diseases.
Keywords/Search Tags:Multi-factor network motif, Biological network analysis, Disease geneprediction, Protein interaction network
PDF Full Text Request
Related items