Font Size: a A A

Design And Implementation Of Focused Crawler For Blogs

Posted on:2016-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:M LiuFull Text:PDF
GTID:2308330464973794Subject:Education Technology
Abstract/Summary:PDF Full Text Request
With the development of Internet,web users have a sharp increase. And sharing as a spirit of the Internet is affecting millions of people. As a social media, Blog has become an important platform for sharing and exchange. People are accustomed to obtain useful information through blogs. They pay more and more attention to speed and efficiency, and they focus on how to obtain valuable information in massive blog quickly and efficiently.Effective accessing to blog resources is the foundation of using blog resources effectively, but traditional crawler can’t satisfy the demand of the application which specifies the subject. The blog-oriented crawler takes educational technology as the subject. It puts a focus on recognizing and obtaining blog resources related to educational technology in a large number of blogs effectively. And it timely updates the resources. It can downloads the blogs and extracts post title, text content, images, and other information. This research focuses on the following aspects:(1) Research on kernel techniques of Focused Crawler for Blogs. Through analyzing the basic characteristics of the blog and pointing out the difference between general webpage and blog webpage, we define the extracted dimensions of blog information. As frequently updated features of blog and in order to improve crawler’s real-time, we select link-types acquisition strategies. And we combine website structure analysis、links type analysis and content evaluation to judge the topic similarity. We set topic categories of the blog related to educational technology and analyze SVM classification algorithm.(2) Design on Focused Crawler for Blogs. Spider’s operating principle and fundamental structure is discussed, especially the difference between general spider and focused crawler.We select suitable frame and expanding its function to meet the demand for Focused Crawler. The designing of database tables is depending on the needs of data interaction.(3) Implementation of Focused Crawler for Blogs. This system implements key functions such as timer, incremental crawler, judgment of topic similarity, automatic classification of blogs. At last, we evaluate the efficacy of the collection and classification.(4) Application of focused crawler in blog analysis. The Experts blog is an important academic resources on the Internet which can facilitate the dissemination and exchange of scientific information in the field rapidly, and reveal the potential theme preferences of the blogger. We take a single expert blog for example and use collected information to analyze topic preferences and change in the trend.The crawler system fully integrates blog features, uses multiple methods to analyze the topic similarity, and receives good effect. It can provide high quality data of blogs for applied research with the subject of education technology. Analysising topic preferences and trends change can provide references for blog readers to gain valuable information.
Keywords/Search Tags:Blog, Focused Crawler, Topic Correlation, Automatic Text Classification, Topic Preference
PDF Full Text Request
Related items