Font Size: a A A

Web Spatial Data Acquisition And Management Method Research Based On Distributed Web Crawler

Posted on:2017-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:L Y ZengFull Text:PDF
GTID:2180330485474124Subject:Surveying and Mapping project
Abstract/Summary:PDF Full Text Request
Geographic information system (GIS) is a discipline based on data. Researches such as spatial analysis, spatial statistics and spatial data mining are inseparable from the support of spatial data. However, there is a large amount of spatial data existing in the internet, which is closely related to people’s daily life and has abundant properties and strong currency compared to the spatial data acquired from traditional specialized methods. If the spatial data from the internet can be acquired, parsed and managed automatically and efficiently, not only can supplement the deficiencies of basic geography information as well as provide rich details and quasi-real time update, but also can provide more abundant and real-time data sources for GIS spatial analysis and spatial data mining.Web spatial data acquisition firstly needs to download the webpages which contain spatial data to a local file system and then need to extract spatial information and attribute information from the downloaded webpages by fine-grained parsing, finally also need to solve the problem of storage and management of the multi-source heterogeneous web spatial data. Single web crawler is limited in coverage rate and efficiency, which is difficult to guarantee the comprehensiveness and timeliness of data, so the method of web spatial data acquisition based on the distributed web crawler was studied in this paper. With the problem of web spatial data form different resources having different structure and contents, updating periodically, parsing difficultly, the paper researched the parse method of web spatial data based on template mapping. Web spatial data have complicated structures and many sources, relational database management system (RDBMS) is hard to deal with this kind of data, so the storage method of web spatial data based on non-relational database——MongoDB was studied. Based on the above methods, built the prototype system of web spatial data acquisition, accomplished in acquiring, parsing and managing web spatial data efficiently and confirmed the validity of the methods by tests, and made an application case of the prototype system.Through the above research, the main conclusions obtained in this paper are summarized as follows:(1) The method of web spatial data acquisition based on the distributed web crawler can improve the efficiency of web spatial data acquisition. Web spatial data acquisition prototype system designed and realized in this paper can run steadily. The system has good expandability and can achieve load balancing between each node in the distributed system.(2) The web spatial data parsing method based on template mapping can enable multi-source heterogeneous web spatial data to be parsed automatically and accurately. In terms of parsing accuracy, the parsing method in this paper based on template mapping is similar to the traditional regular expression parsing method. In terms of recall value, the parsing method based on template mapping is better than traditional regular expression parsing method.(3) The storage method of web spatial data based on MongoDB can realize the object-oriented storage of multi-source heterogeneous web spatial data, which reduces the complexity of web spatial data storage and management as well as enhances the flexibility and automaticity of web spatial data storage.
Keywords/Search Tags:Web spatial data, Distributed web crawler, Template mapping, MongoDB, Prototype system
PDF Full Text Request
Related items