Font Size: a A A

Modeling and thematic analysis of neighborhood structures in the Web and hierarchical identification of Web communities

Posted on:2008-11-07Degree:M.ScType:Thesis
University:Memorial University of Newfoundland (Canada)Candidate:Nargis, IsheetaFull Text:PDF
GTID:2445390005957131Subject:Computer Science
Abstract/Summary:
The web graph represents the structure of the World Wide Web by denoting each web page as a vertex and each hyperlink as an arc. The motivating goal behind the research constituting this thesis is twofold---firstly, to model the local structure of the web graph, and secondly, to discover communities of related web pages.;We devise an algorithm to extract communities solely based on the topology of the Web. Central to our approach is the innovative idea of Iterative Cycle Contraction to discover Web communities comprised of related web pages. The intuition behind this algorithm is that if two pages link to each other then they are thematically related. Successive iterations yield a hierarchical structure of communities and allow us to define a similarity measure between two web pages in the same community by noting at which iteration their corresponding vertices are first grouped into a single vertex. We apply the algorithm to some focused subgraphs of the web graph and evaluate its effectiveness by performing an investigation into the theme(s) of the putative communities that it finds. We find that the algorithm is successful at identifying communities and distinguishing communities with varying thematic content in neighborhood graphs. An examination of the distribution of community sizes in a particular iteration of the algorithm reveals that for sufficiently large neighborhood graphs a power law is observed.;We study the concept of the neighborhood graph to model the local structure surrounding a particular web page. We analyze some structural and statistical properties of the neighborhood graphs and perform a comparison with the corresponding properties of the whole web graph. In several aspects these neighborhood graphs show a similar characteristic to the entire web graph. Both the indegree and outdegree distribution of a sufficiently large neighborhood graph follow the power law phenomenon. We perform a thematic analysis of the local structure of the Web by discovering authorities and hubs in neighborhood graphs, where the set of authorities and hubs gives a representative flavor on the theme of the page in question. We also analyze temporal evolution of Hyperlinked communities and identify Core communities in neighborhood graphs.
Keywords/Search Tags:Web, Communities, Neighborhood, Structure, Graph, Page, Thematic
Related items