Font Size: a A A

Scalable and robust algorithms for mining clusters in graphs

Posted on:2014-08-25Degree:Ph.DType:Thesis
University:The Pennsylvania State UniversityCandidate:Mandala, Supreet ReddyFull Text:PDF
GTID:2458390008450324Subject:Operations Research
Abstract/Summary:PDF Full Text Request
Several social and technological systems around us can be modeled as a network or a graph. The topology of such networks is known to play a major role in understanding and predicting the system behavior. A topological feature which has been observed across several domains is the formation of clusters. A cluster or module or community is a subset of vertices which have high density of connections between cluster members and few connections leaving the cluster. The knowledge of clusters is extremely useful in a variety of domains - product recommendations in social networks, characterizing functions on unknown proteins in protein interaction networks, product placement and promotions in retails networks. In this thesis, we focus four problems addressing different aspects of graph clustering. Firstly, we propose an algorithm which can detect disjoint clusters in graphs. We optimize an existing quality function, called modularity, which quantifies the quality of a given partition. Therefore, detecting clusters is equivalent to finding a partition which maximizes modularity. Due to the combinatorial nature of the modularity maximization it is hard to find optimal solutions. We propose an ant colony optimization based heuristic which can detect near optimal partitions in a reasonable amount of time. We validate the effectiveness of our algorithm on various social network data sets. Our second problem addresses the issue of detecting robust graph partitions. The motivation behind this problem is to detect partitions which not only maximize modularity but are also immune perturbations in network topology. In several real world setting, perturbations in network topology can be expected due to noisy experimental data. We develop a robust optimization based max-min optimization model in order to define a robust graph partition. Furthermore, we propose an algorithm which can detect a robust partition while maintaining scalability. The usefulness of robust partitions in illustrated on both real and artificial networks. The third problem deals with detecting multiple diverse alternative partitions with high modularity scores. It has already been reported that several dissimilar partitions may exist with quite similar modularity scores. We propose an approach to detect alternative graph partitions by modifying the network topology forcing clustering algorithm to detect diverse partitions. We confirm the existence of diverse partitions in several real world networks. A case study is presented for a retail graph (book co-purchase network) at amazon.com illustrating the utility of our approach. As a part of the fourth problem, we propose a first principles definition of a cluster which does not have any limitations of modularity based approaches. Moreover, our approach is able to detect overlapping clusters. We show that our definition of a cluster has a direct correspondence with the nash equilibrium of a non-cooperative game. We define a clustering game and show the existence of a nash equilibrium. Both parallel and non-parallel algorithms are proposed which have a near linear time running time. We compare our approach to existing overlapping cluster detection techniques when tested on artificial networks. Also, potential applications are shown in natural language processing (word sense disambiguation) and WWW segmentation problems.
Keywords/Search Tags:Graph, Network, Robust, Cluster, Algorithm, Partitions, Topology, Several
PDF Full Text Request
Related items