Web Spatial Data Acquisition And Management Method Research Based On Distributed Web Crawler

Posted on:2017-04-04

Degree:Master

Type:Thesis

Country:China

Candidate:L Y Zeng

Full Text:PDF

GTID:2180330485474124

Subject:Surveying and Mapping project

Abstract/Summary:

PDF Full Text Request

Geographic information system (GIS) is a discipline based on data. Researches such as spatial analysis, spatial statistics and spatial data mining are inseparable from the support of spatial data. However, there is a large amount of spatial data existing in the internet, which is closely related to peopleâ€™s daily life and has abundant properties and strong currency compared to the spatial data acquired from traditional specialized methods. If the spatial data from the internet can be acquired, parsed and managed automatically and efficiently, not only can supplement the deficiencies of basic geography information as well as provide rich details and quasi-real time update, but also can provide more abundant and real-time data sources for GIS spatial analysis and spatial data mining.Web spatial data acquisition firstly needs to download the webpages which contain spatial data to a local file system and then need to extract spatial information and attribute information from the downloaded webpages by fine-grained parsing, finally also need to solve the problem of storage and management of the multi-source heterogeneous web spatial data. Single web crawler is limited in coverage rate and efficiency, which is difficult to guarantee the comprehensiveness and timeliness of data, so the method of web spatial data acquisition based on the distributed web crawler was studied in this paper. With the problem of web spatial data form different resources having different structure and contents, updating periodically, parsing difficultly, the paper researched the parse method of web spatial data based on template mapping. Web spatial data have complicated structures and many sources, relational database management system (RDBMS) is hard to deal with this kind of data, so the storage method of web spatial data based on non-relational databaseâ€”â€”MongoDB was studied. Based on the above methods, built the prototype system of web spatial data acquisition, accomplished in acquiring, parsing and managing web spatial data efficiently and confirmed the validity of the methods by tests, and made an application case of the prototype system.Through the above research, the main conclusions obtained in this paper are summarized as follows:(1) The method of web spatial data acquisition based on the distributed web crawler can improve the efficiency of web spatial data acquisition. Web spatial data acquisition prototype system designed and realized in this paper can run steadily. The system has good expandability and can achieve load balancing between each node in the distributed system.(2) The web spatial data parsing method based on template mapping can enable multi-source heterogeneous web spatial data to be parsed automatically and accurately. In terms of parsing accuracy, the parsing method in this paper based on template mapping is similar to the traditional regular expression parsing method. In terms of recall value, the parsing method based on template mapping is better than traditional regular expression parsing method.(3) The storage method of web spatial data based on MongoDB can realize the object-oriented storage of multi-source heterogeneous web spatial data, which reduces the complexity of web spatial data storage and management as well as enhances the flexibility and automaticity of web spatial data storage.

Keywords/Search Tags:

Web spatial data, Distributed web crawler, Template mapping, MongoDB, Prototype system

PDF Full Text Request

Related items

1	Internet Spatial Vector Data Automate Acquisition And Management Method Research
2	Research On The Key Technologies Of Guiding Thematic Mapping
3	Under The Distributed Parallel Computing Gml Spatial Data Replication Synchronization Update Mechanism Research
4	Storage And Parallel Query Technology Research In Distributed Environments Massive Spatial Data
5	Research On Fast Time Series Reconstruction Of Massive Astronomical Catalog Data Based On MongoDB
6	Research On Web-based Spatial Data Grab And Evaluation
7	Research On Key Technologies Of Spatial Vector Big Data Storage Model And High Performance Analysis In Distributed Environment
8	Organization And Management Of Airborne LiDAR Point-cloud Data Based On MongoDB
9	A Spatial Analysis Of Public Bicycle System Based On GIS And WEB Crawler Technology
10	Research On The Key Technology Of Service-oriented Spatial Data Management