Font Size: a A A

An Improvement Strategy On The HITS Algorithm Of Web Mining

Posted on:2014-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z WuFull Text:PDF
GTID:2248330395994652Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The21st century is a time which is well-known as the increase of the informatization inthe society and the development in network technology. An incredible amount of informationhas been integrated, saved and transmitted through the Internet. In this society that peoplewould do nothing without computer and the Internet, how to get the information efficientlyfrom the Internet has became an independent part in everyday life.Definitely, the occurring and developing in search engine provides the possibility for thecatching and retrieving of information in the Internet. However, no technology is perfect in theworld. Since the search engine is based on the universal property, the search engine cannotprovide the preference, the accurate and scientific information catching. Since website pagesconclude various and complex information and the structure of data, it is difficult to analyzethe information accurately for the web page. The same problem happens in the issue ofcatching and retrieving the information. Web Mining technology is generated on the basis ofthe traditional data mining techniques. In Web Mining technology, it is possible to make thecorrelation analysis on the structure information of web, text information or other content inthe web pages. The subject for this paper is how to provide an effective and accurateinformation retrieval programs.In this paper, first, we make an introduction for the classic Web Mining link analysisalgorithm, HITS algorithm and PageRank algorithm, and analyze their advantages anddisadvantages. We choose HITS algorithm as the basic algorithm to research. In theexperiment, we find HITS algorithm is lack of the sensibility for the effectiveness ofinformation. Moreover, HITS algorithm has some problems in recognize an invalid linkredundancy. To deal with these issues, we design a new algorithm which we called TM-HITSalgorithm. TM-HITS algorithm is a method of attenuation parameters based on time, it is animprovement of the traditional HITS algorithm. The paper analyses the analog data and thereal data which use the web pages catching technology. The experimental data shows that thealgorithm can effectively obtain the pages which has higher effectiveness, and dose a goodjob on avoiding the interference of malicious or non-malicious unwanted chain into the link,such as the advertising links and invalid page. In addition, the paper also did some prospects for future web mining technology trendsand proposes a feasibility method of an information retrieval model based on the integratedwhich use two types of link analysis algorithm. The method may create a link and analysismodule which integrates difference algorithm on server and the client. Moreover, the methodcan make different precision search according to the needs of different users. The method canbe introduced to the machine learning methods which is able to constantly modify the model,in order to reach the goal of the intelligent retrieval as well as make the different users able tocustomize retrieval services deeply according to their preferences.
Keywords/Search Tags:Web Mining, HITS Algorithm, Timed-Decay Parameter
PDF Full Text Request
Related items