| Spatial data mining extracts implicit knowledge,spatial relationships and patterns from spatial data.Spatial co-location pattern mining acts as an important branch of spatial data mining.Its purpose is to find subsets of a set of spatial features,whose instances satisfy spatial co-location relationship and specific constraint.The classic spatial co-location pattern mining adopts constraint of prevalence.This paper focuses on the mining of a class of spatial colocation patterns and their extended patterns based on influence constraints(collectively called high influence pattern mining),which aims at mine subsets of a set of spatial features whose instances satisfy specified proximity relationships and whose patterns have significant influencing measurement.At present,there have been researches on high influence co-location pattern mining methods in the fields of public health and traffic governence.The existing methods for high influence co-location pattern mining delineate circular influence areas of instances by spatial distance,and mine high influence co-location patterns as per the ratio of overlapped influence area in total area.Therefore,the methods have the defects of influence being linked to spatial distance,single shape of influenced area and high computational complexity in finding influence overlapping areas.To address these defects and fully exploit rich non-spatial data resources,this paper introduces attribute vectors information to spatial instances and uses attribute information instead of spatial distance to calculate bilateral influence that neighbored instance exerts on each other at multiple aspects.These attribute information can represent the states of instances at specific moments,for instance,attribute data such as turnovers and service rankings sometime can be applied to calculate bilateral influence of supermarket and its colocated chain store in terms of performance and public praise.Based on it,this paper puts forward influence measuring methods for spatial co-location patterns and extended interaction patterns and propagation patterns and systematically explores diversified high influence pattern mining as per three application scenarios.This paper has main research contents and contributions as follows:1.High influence co-location pattern mining: For the scenarios where co-located instances have different performance on multiple designated aspects due to bilateral influence,for instance,a western restaurant and its adjacent snackery have different performance in terms of operation indicators and service levels.To address high computational complexity of existing methods,this paper proposes a method of calculating the influence index of co-location pattern based on attribute information of cliqued instances,in accordance with an intuitive experience that more neighbors an instance owns or more similar it shows to its neighbors,more significant influence the instance will have.In order to reflect the discrepances in the directions and lengths of attribute vectors of instances,this paper creates composite similarity with cosine similarity and Mahalanobis distance between instance neighbors w.r.t attribute vectors and transforms the possibilities of influence represented by number of neighbors and composite similarity of instances to entropies with information entropy technology.After proving the influence index measure satisfies downward closure property which is applicable for pruning search space,this paper puts forward a high influence co-location pattern mining algorithm with pruning strategy and conducts its time complexity analysis finding that the efficiency of our method is higher than that of join-less after the mining process reaches a specific order.Experiments are conducted to verify the effectiveness,efficiency and scalability of the algorithm.2.High influence interaction pattern mining: It aims at the scenarios where instances affected by the interaction of influential media have diverse performances at different times,such as epidemics spread by leaps with infectors traveling around types of cities,these cities have different influence on each other due to different key indicators of the epidemic,environmental conditions,personnel exchanges and close economic ties.To address the problems in existing high influence co-location pattern mining that it is difficult to determine the influence between instances beyond distance threshold,and influence indexes of patterns can only be computed on spatial co-location patterns,this paper defines semantic proximity relationships between instances with influential media flows,puts forward concept of interaction pattern based on star structure of proximity relationships and introduces a method to mine interaction patterns by decomposing large size candidate patterns into smaller size ones via central feature.And we develop an attribute descriptor to extract the attributes of instances and edges from time series data,calculate the attribute weights by analytic hierarchy process and give ways to calculate directional influence in instance neighbor pairs and measure star influence index for interaction patterns.A benchmark algorithm for mining high influence interaction patterns is presented.After analyzing the star influence index of interaction pattern does not satisfy the downward closure property and proving two properties of the measure,this paper proposes two improved algorithms with pruning strategies and analyzes the time complexity of benchmark algorithm.Experiments are conducted to verify the effectiveness,efficiency and scalability of the algorithms.3.High influence propagation pattern mining: This paper addresses scenarios where prior event affects subsequent specific events in sequence of spatial-temporally neighbored events to appear more densely than average state,such as rainy or snowy days cause more local traffic congestions than usual.To fill the research gap of mining high influence patterns from spatiotemporally neighbored tree instances,this paper analyzes the characteristics of the spatiotemporal dataset and quickly generates neighbor sets for spatio-temporal events by constructing a KD tree for geospatial entity,then creates influence propagation trees of events with a prefix tree method and finally extracts propagation patterns of features from the influence propagation trees by preorder traversal.After defines serial concepts such as the propagation sequence,event sequence and tail event set,etc.,this paper introduces a method for computing the influence indexes of propagation patterns and presents a benchmark algorithm for mining high influence propagation pattern.While analyzing the influence index of propagation pattern does not satisfy the downward closure property but meet weak downward closure property and other properties,this paper integrates three-layer hashmap data structure with level traversal encoding to avoid the steps of creating influence propagation trees of events and improve efficiency,proposes an improved algorithm with the optimization strategy and analyzes the time complexity of the benchmark algorithm.Experiments are conducted to verify the effectiveness,efficiency and scalability of the algorithms. |