| Cluster mining is an important branch of data mining, which is widely used in many applications such as speech recognition, image segmentation, marketing, finance and insurance, e-commerce and other fields. The essence of cluster mining is to partition sample sets into several classes according to their attributes, ensuring that intra-class samples are as similar as possible while inter-class samples are as dissimilar as possible. Multi-scale clustering is a typical interdisciplinary topic, in essence, which makes use of clustering technology to analysis the structure of research objects in a way of multi-scale and multi-level, as well as scale effect caused by scale transformation and function relationship of each scale. There has been considerable progress in cluster mining field with multi-scale theory, and scholars have put forward a number of theories and methods of multi-scale cluster mining, which are limited to space and image data in many cases. This phenomenon restricts the application and promotion in clustering technology based on multi-scale science.In this paper, features of multi-scale science and cluster mining are combined reasonably, aiming to develop multi-scale theories and multi-scale clustering methods for general datasets. The theories and methods of multi-scale science are introduced in cluster mining field, proposing the generalized scale definition based on the concept hierarchy, analyzing the substance of scale transformation and scale effect, building multi-scale cluster mining system structure, and ultimately forming multi-scale cluster mining theory system; taking multi-scale cluster mining theory and methods as the guiding ideology and combining with unbiased optimal estimation kriging method, this paper proposed the multi-scale cluster mining upscaling mining algorithm and downscaling mining algorithm, which achieve the goal of multi-scale data clustering results; finally, proposing scale transformation results evaluation index based on information entropy, which provides the theoretical and methodological support for the final multi-scale cluster mining results.This paper stands on cluster mining, with the help of multi-scale scientific theory, explores the construction of multi-scale cluster mining theory system, and makes a study of multi-scale clustering methods of scale conversion and scale transformation results evaluation index. The main research contents are stated as follows:1) Discussion about building multi-scale cluster mining theory systemTraditional cluster mining does not carry on profound and essential research for the multi-scale character of datasets, and the existing multi-scale cluster mining theory and methods are limited to space or image data in many csaes. According to existing problems, the article builds multi-scale cluster mining theory system from the following four aspects: multi-scale datasets, scale transformation, scale effect and multi-scale cluster mining system structure. Firstly, based on the concept hierarchy, this study proposes the definition of data scale, scale partition, multi-scale datasets and grandparents-and-grandchild, father-and-son, brother and upper-down relationship between multi-scale datasets, which establishes theoretical foundation; Secondly, it analyzes the core of multi-scale cluster mining:the definition, cause, classification and approaches of scale transformation; Thirdly, it summarizes the definition and impact of multi-scale clustering scale effect; Finally, on the basis of traditional data mining process, the article proposes multi-stage multi-scale cluster mining system structure, providing theoretical support and implement ideas for further study on multi-scale clustering.2) The proposal of multi-scale cluster mining algorithmsMulti-scale cluster mining theory system provides theoretical basis for scale transformation, combining with scale transformation process to construct multi-scale mining clustering algorithm frame; analyzing the essence of kriging method dealing with general datasets; by analyzing mature scaling method thought of the current geoscience, image science, biology and other disciplines, it proposed MSCSUA(Multi-Scale Clustering Scaling Up Algorithm) based on the BK(Block Kriging), and MSCSDA(Multi-Scale Clustering Scaling Down Algorithm) based on ATPRK(Area To Point Regression Kriging). These algorithms achieve multiscalization of cluster mining knowledge and compare the results with the traditional clustering algorithms which perform clustering in the target scale directly. Experiments analyze the correctness and feasibility of these algorithms.3) The proposal of multi-scale clustering validity indexThe multi-scale clustering validity index is a quantitative assessment of the results of upscale and downscale in multiscale clustering, and it is the most intuitive analysis and evaluation for scaling algorithms. This paper combines scale transformation precision evaluation indexes in the multi-scale field with clustering validity indexes, introduces information entropy to measures the uncertainty degree of clustering results scale effect by different clustering validity indices, and normalizes information entropy results as weight of each cluster validity index to acquire MSCVI(Multi-Scale Clustering Validity Index) which can apply to different practical applications more effectively.4) Verification experiments on the multi-scale cluster mining algorithms and multi-scale clustering validity indexThis paper applies the proposed multi-scale cluster mining algorithms and multi-scale clustering validity indexes to several UCI common datasets and real total population data of H province for analysis and test. Experimental results show that proposed algorithms have higher accuracy and shorter running time compared with traditional clustering algorithms, and are feasible clustering algorithms. Compared with traditional clustering validity indexes, multi-scale clustering validity index also has greatly improved in accuracy, and has good evaluation results for high dimensional datasets. |