Font Size: a A A

Study On Fast Statistical Technology Of Forestland Border Data In Parallel And Distributed Environment

Posted on:2016-04-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:J R YinFull Text:PDF
GTID:1223330470461304Subject:Forest management
Abstract/Summary:PDF Full Text Request
Forestland management is the foundation of forest resource management, in order to improve the overall monitoring and management level of forest resources, our country was under the guidance of "National Forestland Protection and Utilization Planning Outline(2010-2020)", carried out and built the national forestland "a map". Which integrated the recent high-resolution remote sensing data, forestland border data, basic geographic data and forestry-related data, formed multi-scale and multi-service type of massive data about all levels(county, province, country)which involved the needs of microscopic to macroscopic management and application, the forestland border data has amounted to more than 6783 million pieces merely. Faced forestland spatial data with the characterstic of volume was so great, data types was so various, speed changing was so high, density value was so low, the data models and statistical processes used in the current system behaved more and more prominent to support the massive data of multi-dimension rapid statistical. Therefore, under the distributed parallel environment, this article studied statistical data model and rapid statistical techniques of the forestland border data.This article was aimed at the problem of massive data dynamic statistics which the forestland "a map" system faced, based on multi-dimension data model, parallel computing and data mining theory and technology, studied four technologies which were multi-dimension statistics model,parallel data optimization deployment, parallel statistical computing, statistical results collect and cache management, builded an efficient and fast system of forest resources statistic technology, and designed experiments to validate related technical points, the result showed that the data models and technologies the article put forward could meet the need of multi-dimension quick statistics of the forestland border data.The research works this paper have done are following:(1) Analyzed the characteristics of the forestland border data and statistical application requirements, put forward a fast statistical technology system of forestland border data based on distributed parallel environments, and provided solution of multi-dimension statistics model,parallel data optimization deployment, parallel statistical computing, statistical results collect and cache management which the system involved.(2) Researched on the multi-dimension statistical model of the forestland border data. Analyzed the statistical measurement and the characteristic of forestland border data, built forestland border data cube based on star model, and put forward the factor combination model, on the basis of the combination of statistical measurement, established factor combination statistical model, realized the forestland border data statistics in different scales.(3)Researched on deployment optimization of the forestland border data. Studied the division and distribution of statistical particle size and the index system research about the forestland border data, solved statistical granularity of management problems in the distributed parallel environment. Analyzed statistical work connotation of forestland border data, determined the data partitioning idea of coping dimension table in each node and dividing the forestland border data fact table. Combined the characteristics of the distributed parallel environment and application scenario, provided dynamic grid spatial data partitioning algorithms based on the Hilbert space filling curve, and determined the size of the forestland border data granularity. On this basis, put forward spatial data deployment scheme based on task load and the graph vertex coloring theory. And based on the statistical specific, proposed multilayer index system based on GTMPR-tree(Graph Coloring and found-based Multi-tiers Parallel R-tree). Test showed that spatial data granularity was more suitable for the demand of parallel quick statistics used county as unit, and through the coefficient of variation(C.V) measured the spatial deployment plan based on graph coloring by quota, the results showed that the improved algorithm the improved algorithm could make the data distributed more balance in each node and the equilibrium degree increased more than two times.(4) Researched on parallel statistical computing of forestland border data. By giving the size of the statistical task granularity and parallel statistical computing model, and put forward the task model based on the GTMPR–tree,solved the problem of the bureau statistics of task granularity resources scheduling.(5) Researched on statistical results collect and cache management. Aimed at statistics results cache influenced by the statistical results and efficiency, proposed the second level cache structure and hybrid cache management strategy based on the static cache table and dynamic semantic cache. Moreover, put forward the cache optimization model based on correlation analysis and statistical update model based on evaluation mechanism to optimized statistical cache. Provided potential valuable factor combination by test. At last, tested statistical efficiency under distributed parallel environment from the overall performance, result showed that the statistical performance had been significantly improved based on the key technologies proposed in this paper.
Keywords/Search Tags:Multi-dimension Statistical Model, Data Optimization Deployment, Load Balancing, Graph Coloring Theory, Parallel Statistical Computing, Statistical Result Collect and Cache Management
PDF Full Text Request
Related items