Font Size: a A A

Efficient techniques for wide-area information dissemination

Posted on:1996-04-22Degree:Ph.DType:Dissertation
University:Stanford UniversityCandidate:Yan, Tak WoonFull Text:PDF
GTID:1462390014985619Subject:Computer Science
Abstract/Summary:
Information dissemination is a powerful information finding mechanism in wide-area environments. An information dissemination server accepts long-term queries that represent user interests, collects new documents from underlying sources, matches the documents against the queries, and continuously updates the users with relevant information. Previous research in information dissemination focused on the effectiveness of interest matching. However, in wide-area environments, efficiency is important as well.; This dissertation studies efficient techniques for wide-area information dissemination. We first focus on the matching process done at a server. Instead of using the traditional approach of evaluating user queries against an inverted index of documents, we construct an inverted index of queries, against which documents are run. We study several alternative indexing techniques for information retrieval-styled (boolean and vector-space) queries. We present performance results obtained by analysis, simulations, and prototyping. The results show that the proposed techniques reduce the average document processing time by orders of magnitude.; Besides content-matching, the removal of redundant or duplicate information is another important task performed by a dissemination server. In this dissertation, we characterize duplicates found in digital documents. We present the desired functionalities of a duplicate removal module in an information dissemination service, and devise efficient implementation techniques for such a module.; Next we consider a distributed system made up of multiple servers. We address the key problem of how to replicate and distribute queries and documents among servers. We draw a parallel between distributed information dissemination and the well-studied replica control problem, adapt quorum-based protocols for use in information dissemination, and analytically compare their performances. Finally we investigate efficient document delivery mechanisms. We present and evaluate a number of schemes that exploit the geographical locality and/or interest locality of users to cut down network traffic generated by document delivery.; We have implemented the Stanford Information Filtering Tool (SIFT) system, which uses the proposed indexing techniques. A sample SIFT server (URL http://sift.stanford.edu) has been set up to disseminate USENET Netnews articles. The server currently has more than 9,000 users world-wide, processing over 80,000 articles against some 23,000 queries daily.
Keywords/Search Tags:Information dissemination, Queries, Wide-area, Server, Techniques, Efficient
Related items