| With the advent of Web2.0era,socialmedia develops rapidly,it has become the important platform of information sharing and spreading. Ajax technology is used in social media sites represented by sina micro-blogã€tencent micro-blog, which implements asynchronous transfer between server and browser,promotes user experience. Meanwhile,it also brings the difficulties of extracting web information. Tranditional web crawler gets the URL links through analysing the static HTML pages that included <a> tags,it appears incompetent to extract dynamic information generated through executing JavaScript.The paper first introduces the principle and the realization process of tranditional web crawler,research the key technology,which include web search strategyã€URL duplication removal algorithmã€web page analysis technology and update strategy.Then,according to the characteristics of the sina micro-blog,point out the deficiency of the traditional web crawler and the difficulties of crawling micro-blog.Besides, the paper also explains based-ajax crawling technology.Atlast,discuss the realization of the based-on micro-blog web crawler system.Based-on Micro-blog Web Crawler System bases on sina micro-blog as the target sites.It implements the extraction of web information which is based on ajax technology successfully, through simulating user action, explaining JavaScript and building DOM tree. |