Font Size: a A A

Models And Algorithms For Community Structure Detection In Complex Networks And Applications In Biological Networks

Posted on:2015-05-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:X K MaFull Text:PDF
GTID:1220330431459584Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Many complex systems in nature and society, such as biology and molecular systems, coupled systems, can be effectively modeled as networks. The rapid development of information technology makes the collection and research of vast amounts of network data possible. With more and more real network data being available, studying the function and property of real systems through the analysis of statistic property of network data has attracted research interests from different disciplines. The complexity of networks forces the researchers to study the local sub-network instead of the global one and to derive the overall structure and behavior of networks via exploring the properties of local structure. Similar to the small-world model and the scale-free model, the community structure has been acknowledged as one of the most important statistical properties. The studying of community structure contributes greatly to understand the functions of genes in organism, design the marketing strategy, and maintain national security, etc.Hence, proposing symmetric models for community structure, developing effective and efficient algorithms as well as applying community structure into real application such as protein complex prediction, social behavior analysis, are major topics of the thesis. Specifically, we investigated following problems and made following contributions:1. How to quantify the partitioning of the network involved is the first step in com-munity structure detection, and the modularity density is one of the excellent indexes. We first investigate the relations between the objective functions of these classic algorithms and the modularity density. First, a generalized modularity density is proposed, which contains all these quantifying functions except the modularity function. The analysis demonstrates that the proposed index can tolerate the resolution limit at large extent. We finally proved then the equivalence with the objective functions of the weighted kernel K-means, nonnegative matrix factorization (NMF), spectral clustering algorithms and the generalized modularity density, although they are different algorithms with various work-ing mechanism. Such a result explains why the spectral, NMF, and kernel optimization algorithms can be applied to the community detection.2. One similarity is insufficient to characterize the topological structure of communi-ties because of the complexity of community structure. To overcome this issue, based on the equivalence relation between the generalized modularity density and the NMF proved, a semi-supervised nonnegative matrix factorization algorithm (SS-NMF) for the commu-nity structure detection by integrating the partial supervision into the NMF algorithm. Previous NMF-based algorithms often suffer from the restriction of measuring network topology from only one perspective, but our algorithm uses a semi-supervised mechanism to get rid of the restriction.3. There are various overlapping and hierarchical community structure in real world networks due to the multi-functions and multi-properties of vertices. How to discover these complicated overlapping and hierarchical communities is critical to uncover structure and understand the dynamic behavior of networks. In a traditional spectral algorithm, each vertex of a network is embedded into a spectral space by making use of the eigenvectors of the adjacency matrix or Laplacian matrix of the graph. However, in many cases, the spectrum is insufficient to characterize the topology. To attack this two issues, in this paper, a novel eigenspace based spectral approach for revealing the overlapping and hierarchical community structure of complex networks is proposed by not only using the eigenvalues and eigenvectors but also the properties of eigenspaces of the networks involved. This gives us a better characterization of community. We first show that the communicability between a pair of vertices can be rewritten in term of eigenspaces of a network, lying the theoretic foundation for the presented method. An agglomerative clustering algorithm is then presented to discover the hierarchical communities using the communicability matrix. Finally, these overlapping vertices are discovered with the corresponding eigenspaces, based on the fact that the vertices more densely connected amongst one another are more likely to be linked through short cycles.4. The spectral clustering algorithms has been extensively studied largely due to the high accuracy and strong theoretic foundation. The spectral clustering, however, is criticized by its cubic complexity of spectrum algorithm that means it can only be applied to small and meddle size data. How to accelerate the spectral clustering while maintain its performance is critical. To overcome this issue, a fast semi-supervised spectral clustering is proposed, which combines the spectral clustering and semi-supervised strategy with an immediate purpose to accelerate the spectral algorithm via the semi-supervised priori knowledge. Furthermore, the speed of the proposed algorithm is determined by the gap between the largest eigenvalue and the second largest one. A comprehensive comparison among non-traditional spectral has been performed.5. Communities in the biological networks correspond to protein complexes or path-ways with specific biological processes. Two complex prediction algorithms are pro-posed:We first transform the problem of detecting protein-complex core in a biological networks into an classic all-clique problem in its corresponding virtual one and, a novel core-attachment based algorithm is proposed; In the second algorithm, to investigate the roles of edges in protein interaction (PPI) networks, we show that the edges connecting less similar vertices in topology are more significant in maintaining the global connec-tivity, indicating the weak ties phenomenon in PPI networks. By using the concept of bridgeness, a reliable virtual network is constructed. Then, an algorithm is developed to discover the protein complexes.6. Integrative analysis multiple networks sheds lights on under-standing disease-stage-specific molecular events is critical for understanding disease etiology and devel-opment of therapeutic interventions. We constructed gene co-expression networks for various stages of breast cancer, and proposed an integrative analysis method for discover-ing modules in multiple networks by transforming the module identification problem into an minimum optimization problem. And, a complete dynamic analysis was performed to discover the dynamic and static modules. The results show that:1) genes in dynamic modules obtained by integrated analysis are more enriched by signal domains than genes in static modules;2) integrative analysis modules can be served as biomarkers that sig-nificantly enhances the accuracy of cancer stage prediction than those obtained by single network based methods;3) connectivity dynamic is not significantly correlated with gene expression activity.
Keywords/Search Tags:Complex network, Community structure, Clustering algorithms, Protein complex, Disease prediction
PDF Full Text Request
Related items