Font Size: a A A

Internet Spatial Vector Data Automate Acquisition And Management Method Research

Posted on:2016-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:D CaiFull Text:PDF
GTID:2180330470971754Subject:Cartography and Geographic Information Engineering
Abstract/Summary:PDF Full Text Request
Researches on acquisition and management methods of multi-source vector spatial data with Internet can make it possible to achieve the efficient acquisition, analysis and management of these potential data on the Internet, and provide richer and more real-time data sources for GIS spatial analysis and spatial data mining to facilitate the study of geographic information technology era of big data. This paper focuses on the study of vector spatial data acquisition technology under the condition of the Internet, presents a method based on multi-thread parallel and asynchronous I/O model to improve the efficiency of the web vector spatial data acquisition, which optimizes problems of monotonous crawling methods and low crawling efficiency of general focused crawler. For the multi-source heterogeneous spatial data structure, the paper proposes a data analysis method based on template mapping, which makes great improvement on accuracy and performance comparing to the conventional web data analysis method using the regular expression. For the problem of complex and variable structure of Web vector spatial data structure, the research on the object storage of vector spatial data based on MongoDB can effectively reduce the complexity of spatial data management.The main contents of this paper are as follows:1) An efficient multi-strategy parallel method was proposed for Web vector spatial data acquisition. Based on focused crawler technology and a variety of researches on open-source crawler frameworks, multi-threading and asynchronous I/O strategies were presented to optimize the efficiency of vector spatial data acquisition.2) An automatic analysis technology for multi-source vector spatial data was raised based on template mapping. By converting structured and semi-structured text data into a tree structure object, analysis was carried out for the heterogeneous vector spatial data on the Internet using a given template. Compared to the traditional regular expression analysis method, this method effectively enhanced the stability of the resolution with a high analytical accuracy base on the template mapping techniques.3) A multi-source vector spatial data object-oriented storage method was proposed based on MongoDB database, which can be used to manage the vector spatial data obtained from the Web crawler. Meanwhile, a series of REST data management APIs were presented to management vector spatial data in a cloud environment.4) Based on the methods above, NetCrawler crawler system was constructed to achieve the rapid acquisition, analysis and management of multi-source heterogeneous vector spatial data under the conditions of the Internet. And the testing results also confirmed the validity of these methods.
Keywords/Search Tags:spatial vector data, web crawler, template mapping, NoSQL, parallel
PDF Full Text Request
Related items