Font Size: a A A

Research On Similarity Of Categorical Variables In GIS Spatial Data Analyse

Posted on:2018-02-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:L ZhaoFull Text:PDF
GTID:1360330548977735Subject:Mine computer application and spatial information engineering
Abstract/Summary:PDF Full Text Request
With the rapid progress of spatial data acquisition and data analysis technology in recent years,GIS has formed a new research hotspots in the fields of multi-source data fusion and semantic information analysis,so that the ability of spatial data analysis to describe people and social activities has greatly strengthened.GIS spatial data analysis played a positive role in promoting of the social space,behavioral geography,urban spatial structure analysis and other areas of the study.Spatial data contains a large number of categorical variables,such as the descriptive content in attribute of geographical names and spatial objects,extracting the semantic information contained in categorical variables is very important for spatial data analysis.The present mainstream semantic relationship analysis methods include frequency-based similarity measure method,ontology-based similarity measure method and similarity based on probability language model.For GIS-based spatial data analysis,the frequency-based similarity measure can support relational database very well,but the accuracy is low and the measure performance is not ideal.The ontology-based similarity is limited by available ontology,and mainly used in the search and matching class applications,which is difficult to expand to other areas;similarity measurement based on the probability language model is difficult to apply in the general data set because of the need for a large number of training data.At the same time,because spatial data analysis is common in the interdisciplinary research including geosciences,computer science and related disciplines of problems,so it is difficult to combine the data and methods,and there also have troubles in matching the similarity measure and the spatial data analysis application.To solve the leak of methods which can be used in relational database and can deal with semantic similarity measurement of multi-source data with complex data types in the GIS-based spatial data analysis,two new approaches of similarity measure are proposed in this paper.The new method is applied to the problem of spatial structure analysis and achieved good results.The innovative research results of this paper mainly include the following three aspects:(1)Aiming at the problem that the accuracy of frequency-based similarity measure is poor,a similarity of categorical variables based on Naive approximate entropy is proposed by referring to the classification of Naive Bayesian classifier.Firstly,a subset of the data representing the categorical variables is constructed on the basis of the generated model,and the feature vectors representing the categorical variables are constructed based on the simple approximate entropy of the data objects in the subset.Finally,the distance between the feature vectors is calculated as the similarity between the categorical variables The performance is superior to the existing method by experimentally validating by using independently or working with the k-modes clustering algorithm by substituting the original similarity measure on the common dataset.(2)Some similarity measures which have expression accuracy can not satisfy the symmetry and the trigonometric inequality.Based on the distribution hypothesis in the probability language model,a Hellinger distance similarity measure with the distribution difference is proposed.The Hellinger distance is used to represent the distribution difference between the same attributes contained in the different classification variables.Then,the feature vectors representing the categorical variables are constructed with the difference as the elements.Finally,the distance between the vectors is calculated as the similarity.Hellinger distance is not only in the differential expression of a very good accuracy,but also meet the measurement performance on non-negative,symmetric and trigonometric inequality requirements.Experiments show that the performance of using with the k-mode clustering algorithm is better than that of the original method,and the applicability to the unbalanced data set is also greatly improved.(3)Aiming at the problem of combining the semantic analysis method with the traditional spatial data analysis applications,extends the semantic similarity of the categorical variables to the semantic similarity between the data objects.Propose a commercial spatial feature extraction and structural analysis methods with the similarity based on Hellinger distance,Semantic similarity of commercial spatial feature extraction and structural analysis methods,and use the Internet data collected from WebGIS to analyze the urban commercial space structure.This method describes the commercial spatial structure of the city by calculating and comparing the economic evaluation index between regions,and complete the analysis of Shenyang commercial space structure with Baidu map plate as the data source.This analysis not only validates the new similarity measurement of categorical variables,but also extends the application of spatial data analysis and Internet spatial data in urban spatial structure research.To solve the problem of combing the semantic analysis method and the traditional problem,is that the semantic similarity of the categorical variables is pushed up to the semantic similarity between the data objects,and a similarity between the Hellinger distance and the similarity of the categorical variables is proposed.Based on the semantic similarity of the commercial space feature extraction and structural analysis method,the spatial data collected from WebGIS are used to analyze the commercial space structure of the city.This method describes the commercial spatial structure of the city by calculating and comparing the economic evaluation index between regions,and uses Baidu map as the data source to complete the analysis of Shenyang commercial space structure.This analysis not only validates the new method of similarity measurement of categorical variables,but also extends the application of spatial data analysis and Internet spatial data in urban spatial structure research.
Keywords/Search Tags:Categorical Variagles, Similarity, Na?ve Bayesian, Hellinger Distance, Urban Spatial Structure
PDF Full Text Request
Related items