Font Size: a A A

Research Of The PageRank Algorithm In Web Structure Mining

Posted on:2012-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:B J GaoFull Text:PDF
GTID:2178330335470298Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the computer network technology in our country, the internet information generate increasingly fast. Search engine has become the biggest application, and the utilization rate also exceeded music, video etc. With the rapid expansion of internet information, the status of traditional web portal decrease, but the internet search engine has developed to "new portal". Users want to get more accurate, comprehensive information, so they propose higher request to the web search engine, it is critical to retrieve the required information and sort results for user.Based on the above, we do more research work about Web data mining and related algorithms in this paper, especially the PageRank algorithm, but we found the algorithm has serious "topic drift" problem, which is very difficult to meet the requirements of the users. The original purpose of Latent Semantic Analysis (LSA) was applied to information retrieval, to solve the problems of Synonyms or polysemous word in Chinese, which may improve search accuracy. In this paper, we propose a new algorithm based on Latent Semantic Analysis and PageRank, but the PageRank algorithm also pay more attention on older web page, we add the time factor into the new improved algorithm. The experiment shows that the new algorithm is more effective than traditional algorithm, which can inhibit topic drift better.
Keywords/Search Tags:Search Engine, Web Structure Mining, PageRank, Latent Semantic Analysis, Time Factor
PDF Full Text Request
Related items