Font Size: a A A

Research And Implementation Of Web Reptiles For Microblogging

Posted on:2013-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:J J LiuFull Text:PDF
GTID:2208330434972719Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of Web2.0era,socialmedia develops rapidly,it has become the important platform of information sharing and spreading. Ajax technology is used in social media sites represented by sina micro-blog、tencent micro-blog, which implements asynchronous transfer between server and browser,promotes user experience. Meanwhile,it also brings the difficulties of extracting web information. Tranditional web crawler gets the URL links through analysing the static HTML pages that included <a> tags,it appears incompetent to extract dynamic information generated through executing JavaScript.The paper first introduces the principle and the realization process of tranditional web crawler,research the key technology,which include web search strategy、URL duplication removal algorithm、web page analysis technology and update strategy.Then,according to the characteristics of the sina micro-blog,point out the deficiency of the traditional web crawler and the difficulties of crawling micro-blog.Besides, the paper also explains based-ajax crawling technology.Atlast,discuss the realization of the based-on micro-blog web crawler system.Based-on Micro-blog Web Crawler System bases on sina micro-blog as the target sites.It implements the extraction of web information which is based on ajax technology successfully, through simulating user action, explaining JavaScript and building DOM tree.
Keywords/Search Tags:Web Crawler, Micro-blog, Ajax
PDF Full Text Request
Related items