Research Of Link Analysis Algorithm Based On Web Similarity

Posted on:2009-02-06

Degree:Master

Type:Thesis

Country:China

Candidate:X X Fan

Full Text:PDF

GTID:2178360272970460

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Abundant information sources are provided by web, which have made the web source mining fairly difficult because of huge volume of data and its complexity, extreme dynamic and all kinds of clients and so on. Owing to the self-organized and semi-structured characteristics of web, classical information retrieve and data base technologies can not applied effectively. Hyperlink is the special component and the bond of linking data sources of web. Link analysis is an important approach to improve the equality of web source mining.This paper analyzes web data mining methods, structure of web search engine, characteristics of web link structure and characteristics and relevant questions of main link analysis algorithms in detail. PageRank and HITS algorithms are two classic link analysis ranking algorithms. But PageRank treats all links equally without considering the importance of updating time of old web pages and the influence of navigation web pages. And HITS exists the multiple reinforcement relationship, the phenomenon of topic drift, the unreasonable results and the unsatisfied information requirement of user site granularity and so on. According to the link character of web and browsing process of web walker, it can found that random walker will jump to the similar pages influenced by the current page content but not jump to the out-link pages with equal probability. Based on that, this paper introduces SimRank algorithm by which the similarity between web pages can be calculated and consequently proposes a new score assignment scheme based on similarity between web pages, which would assign large jumping probability to the largely relevant links. Then we propose a new ranking algorithm based on distributed factor (RAD).This paper designs the platform of RAD algorithm including experiment data sets, data base, pruning disposal, SimRank and RAD implementation. Experiment results show that RAD algorithm executes more efficiently than standard PageRank algorithm in relative precision and mean average precision.

Keywords/Search Tags:

Link Analysis, PageRank Algorithm, Random Walk, RAD Algorithm

PDF Full Text Request

Related items

1	Research And Application Of Random Walk Algorithm Based On Distance
2	Research And Implementation On A Hybrid Random Walk Algorithm For 3-D Thermal Analysis Of Integrated Circuits
3	The Study Of Web Search Algorithms
4	Studies Of The Hyperlink Analysis Algorithms In WWW
5	Research On Active Learning Algorithm Based On Random Walk Sorting And Clusterin
6	A Co-authorship Based Random Walk Model For Academic Collaboration Recommendation
7	Web Authority Sort Classification Algorithm Based On The Analysis Of Link Credibility
8	Optimization And Implementation Of PageRank Using MapReduce
9	Application Optimization Of Random Walk Algorithm In Dynamic IC Power Network Analysis
10	Research On Search Engine Ranking Algorithm Based On Link Analysis