Font Size: a A A

Investigating The Mechanisms Of VGI Data Quality Assurance Based On History Data

Posted on:2017-04-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:A R YangFull Text:PDF
GTID:1360330569498416Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Web 2.0 era breaks the barriers between consumers and producers,providing an op-portunity to overcome the hardness of the closed,expensive and slow-to-update traditional data production methods.Entering the era big data,the existing data production methods can no longer meet the ever-changing data requirements.The crowdsourcing projects like Wikipedia have succeeded in a number of areas,building a benign and open knowledge-producing community.In the field of geographic information,the volunteered geographi-cal information also incorporates an open editing mode,allowing ordinary users to upload and edit data freely following certain rules,that is,the geographic information is produced and maintained by the " volunteers".The principle of VGI is similar to Wikipedia.Taking advantages of the expertise of various users in the sub-field(area),along with systematic mechanics of monitoring and collaborative error correction,VGI projects continuously expand the datasets and improve the quality.Volunteered geographical information fo-cuses on the use of new technical means to break the existing data barriers,fulfill the new data needs,and promote the development of large-scale geospatial data.In order to replace or supplement traditional data production,the quality of volun-teered geographical information has to be carefully discussed.Disputes over the quality of volunteered geographical information have never fade due to the lack of industry stan-dards and participant verifications.Although many studies have proven that volunteered geographical information can be of high quality,this topic is still not properly addressed at the theoretical level due to the spatial and temporal limitations of the reference data and the regional heterogeneity in the quality of the VGI data.In order to overcome the problem,it is critical to understand and confirm how the quality of the volunteered ge-ographical information can be guaranteed during the data production.In this paper,we take the most typical VGI project,OpenStreetMap,as the target of our study.Combining the two major dimensions of VGI research,research of quality and research of contributor behaviors,this paper analyzes the history of VGI contributions to explain how the quality of the data in the project development process can be guaranteed.The paper includes the following parts.(1)Establish the spatio-temporal model for the data evolvements and the contribut-ing behaviors in VGI;Design and implement the gereral toolkit for OpenStreetMap his-tory data processing,which has prominent advantages over existing toolsTraditional geographic information is static,of which data refreshing generally hap-pens only when the real world objects change.The frequency is low and the process is opaque.Volunteer geographic information,on the other hand,is fundamentally differ-ent:the data are always in high-speed changes and the changes are always visible.The changes may reflect the evolution of real world objects or are just the process of contin-ual accumulation and enrichment of data under community efforts.At the same time,the community structure and contributor behaviors also change in time and space,interacting with the evolution of data.This dynamic time-space process reflects the nature of vol-unteer geographic information,which is the key to explain the how to ensure the quality in this new form of geographical data production.Regarding this,an increasing trend is to use historical contribution data to analyze patterns of collaboration and data evolution.However,these efforts are limited by the large data volume,the unfriendly data format,and the inherent complexity of spatio-temporal data.This paper defines a spatio-temporal model for VGI based on temporal geography,which is useful to model,analyze and ex-plore related topics.We then implement a toolset so that relevant research can efficiently model and produce the results you want,avoiding repetitive efforts.(2)Quantatitively analyze the characteristics,the spatio-temporal trends,and the underlying mechanics of contribution inequality,expand and deepen the research on het-erigeity of VGI communitiesThe ever growing scale of VGI keeps challenging our understanding,among which contribution inequality is one of the most important.Contribution inequality means that a majority of the data come from a minority of the contributors,while the majority of the community in total only account for a small percent of data.This phenomenon is critical to understand where the data come from and how the projects evolve.Previous research recognizes this phenomenon but has not discussed it along with the project development.More comprehensive statistical analysis is unseen either.This paper answers the follow-ing questions.How the contribution inequality changes in time?Which kind of contrib-utors play a critical role in this trend,the“silent majority" or the“vocal minority”?We use Gini coeff-icient and Lorenz curve to measure the inequality,design a quantile-based classifying strategy to discuss the community structure,and use Mann-Whitney-Wilcoxon test to investigate the changes in productivity.(3)Investigate whether VGI data come from professionals or amateurs based on contributing behaviors;Subversively revaluate the basis of related researchMost previous research regarded VGI as creations of amateurs.Some of them have recognized the heterogeneity but do not discuss on whether the data come from amateurs or professionals,possibly due to the impressions that most people in the community are amateurs.This problem,however,directly influences how we can explain the data of good quality,infer the quality of new data,and predict the future of the projects.This paper based the discussions on the fact of contribution inequality and focuses on the major contributors.We design a logical inference framework based on Bayes' Theorem and define various indicators and behaviors around the topics of practice,skill,and motivation.The findings reveals the fact hidden in the noise of heavy tail distributions,that is the contributors accounting for most VGI data are professionals.(4)Analyze the preference,its trends and its influences of major contributors,in par-allel with the trends in data evolvements;Extend the research to the temporal dimension;Dive deep in the structures and impacts underlying the phennomenonThe preferences of major contributors decides the directions of the VGI projects.With more contributors prefer positional precision,geometric precision,details or at-tribute precision,the data quality on the corresponding aspects will be higher.The trends in preferences are critical to understand the evolving process of data quality.Existing research has not investigated further after finding the existence of preferences and failed to uncover the underlying details and influences.This paper utilizes entropy and a range of statistical method to reveal the preferences along with the trends in major contribu-tors.Moreover,we use association analysis approach from the literature of data mining to answer whether the shifts of preferences mainly come from longlasting contributors or newcomers.
Keywords/Search Tags:Volunteered Geographical Information, Neogeography, OpenStreetMap, Data Quality, Behavior, Inequality
PDF Full Text Request
Related items