| With the rapid development of the computer network technology in our country, the internet information generate increasingly fast. Search engine has become the biggest application, and the utilization rate also exceeded music, video etc. With the rapid expansion of internet information, the status of traditional web portal decrease, but the internet search engine has developed to "new portal". Users want to get more accurate, comprehensive information, so they propose higher request to the web search engine, it is critical to retrieve the required information and sort results for user.Based on the above, we do more research work about Web data mining and related algorithms in this paper, especially the PageRank algorithm, but we found the algorithm has serious "topic drift" problem, which is very difficult to meet the requirements of the users. The original purpose of Latent Semantic Analysis (LSA) was applied to information retrieval, to solve the problems of Synonyms or polysemous word in Chinese, which may improve search accuracy. In this paper, we propose a new algorithm based on Latent Semantic Analysis and PageRank, but the PageRank algorithm also pay more attention on older web page, we add the time factor into the new improved algorithm. The experiment shows that the new algorithm is more effective than traditional algorithm, which can inhibit topic drift better. |