Algorithms for large graphs

Posted on:2011-08-16

Degree:Ph.D

Type:Thesis

University:Georgia Institute of Technology

Candidate:Das Sarma, Atish

Full Text:PDF

GTID:2440390002952172

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

This thesis explores algorithms for massive graphs. We explore efficient graph algorithms under standard models of processing such large data sets.;The main graph problem considered in this thesis is performing random walks efficiently. Random walks are fundamental across all of computer science ranging from theory, mathematics, distributed computing and web algorithms. Several applications include search, data mining, ranking (such as PageRank), measuring similarity between web pages, maintaining connectivity in P2P networks etc. The work in this thesis has developed the fastest algorithms for performing random walks in the streaming model, and in distributed networks, breaking past the linear-time barrier presented by the sequential nature of random walks. This work improves upon techniques that have been used for decades in both theory as well as practice.;In follow up work on the streaming model presented in this thesis, we consider the problem of graph partitioning. Several offline and few streaming/online algorithms have been suggested for partitioning an input graph into two parts, to approximate the conductance of the graph. While many of these techniques are impractical at the scale of the massive web graphs, some may be implementable. However, when dealing with several hundred million nodes and tens of billions of edges, visualizing one global cut becomes very difficult. What can one say from a partition that separates the graph into two humongous sets of nodes? In this work, we consider the problem of finding what we call cut projections. Given a (possibly small) subset of nodes from the graph, the objective is to partition the set of nodes such that this projects onto an induced cut of small conductance on the entire graph. The hope is that the bounds now strongly depend on the size of the specified set of nodes and only weakly on the size of the entire graph. We show how such cut projections can be obtained on regular graphs when the subset of nodes is chosen at random. While our theorems do not hold when these restrictions are omitted, the technique itself is likely to work well in practice.;In the distributed computing model, we again focus on the problem of performing random walks efficiently. Our results on this model work only on undirected networks (which is usually the case, since the communication paths are bidirectional).;In follow up work under the same congest model of distributed networks, we further improve the algorithm for performing several random walks. Further, we show a lower bound that suggests that a single random walk requires O( ℓ/logℓ ) rounds. Therefore, barring the diameter term, our approach for performing a single random walk is near-optimal. Our algorithm also introduces several new techniques and random walk properties that may be of independent interest. Further, we show how our algorithms for performing random walks efficiently can be used as subroutines for efficient distributed algorithms for two key properties: random spanning trees, and estimating mixing times. We show the fastest known distributed algorithms for both sampling a random spanning tree (which is useful in applications such as routing and generating sparsifiers), and estimating mixing times (which is crucial to understanding the connectivity of the network, a primary concern in peer to peer systems).;Finally, in the online framework of sketch based algorithms, we study the problem of estimating graph distances efficiently. Our objective is to be able to answer distance queries between a pair of nodes in real time. Since the standard shortest path algorithms are expensive, our approach moves the time-consuming shortest-path computation offline, and at query time only looks up precomputed values and performs simple and fast computations on these precomputed values. More specifically, during the offline phase we compute and store a small sketch for each node in the graph, and at query-time we look up the sketches of the source and destination nodes and perform a simple computation using these two sketches to estimate the distance. (Abstract shortened by UMI.)...

Keywords/Search Tags:

Algorithms, Graph, Nodes, Random walks, Model, Thesis

PDF Full Text Request

Related items

1	Exploring Complex Networks By Random Graph Evolution And Random Walks
2	Research On The Algorithms Of Identifying The Influential Links In Social Networks
3	Application of statistical physics to random graph models of networks
4	Research On Influential Nodes Identification In Complex Networks
5	Research On Random Walks With Trap On Two Kinds Of Self-similar Complex Networks
6	Random walks on the dihedral group and abelian groups
7	Some Properties Of Random Walks In Random Environments
8	Random Walks On Cubic Graph Class
9	Asymptotics Theory Of Random Walks And Applications In Risk Theory
10	Research On Relatively Important Nodes Mining And Its Application In Criminal Networks