| As an important carrier of family information,genealogy records family lineages,origin,migration,marriage,culture,rules and other contents,which have multiple values such as academic research,economy,and education.Deeply mining and analyzing the multiple information of genealogy will play a positive role in promoting the full play of the value of the genealogy and making contributions to the socialist cultural construction,economic development and social progress.However,huge volumes,heterogeneous lineages and multivariate attributes bring difficulties to the research of genealogy,especially in the fields of multivariate feature mining and association analysis.Therefore,it is especially urgent to improve the demand for large-scale genealogy data cognitive level and analysis ability through the joint representation of multivariate attributes.Graph clustering and graph matching are the common methods to simplify data cognition and expression,which are widely used to analyze various graph data.Family trees are also a special kind of graph with hierarchical structures.However,existing graph clustering strategies ignore the multivariate attributes,resulting in strong uncertainty of clustering results.The traditional graph query language is not only expensive to learn but also cannot comprehensively consider the multivariate attributes.It can be seen that traditional methods are difficult to meet requirements for the exploratory analysis of family trees.Different from traditional methods,feature fusion based on correlation can effectively extract the correlation between multiple features,so that the multiple features can be jointly represented in the vectorized space,contributing to the graph analysis tasks such as clustering and matching.Visual analysis can integrate data mining,model analysis and other methods,combined with exploratory interface and interactive technology to help users realize the exploration and analysis of large-scale genealogical data according to their prior knowledge or specific requirements.The purpose of this paper is to study the clustering and matching of genealogy by attribute-structure synchronization,family clustering,family matching and visual analysis method.The main innovation points of this paper are as follows:Firstly,aiming at the joint multivariate attribute representation of genealogy,a visual analysis of the family clustering driven by attribute-structure synchronization is carried out.we propose a structure-attribute fusion model for genealogy clustering in this paper.First,we leverage a graphlet kernel method to measure the structural difference between the family trees.Then,Partial Least Squares(PLS)is utilized to combine learned vectors and multiple attributes,and a joint dimensionality reduction is conducted to project family trees into a low-dimensional feature space,where family trees sharing similar structures and attributes located close to each other are further aggregated based on a distance-based clustering method.Furthermore,we provide various kinds of quality metrics to evaluate the clustering features,and design a set of multi-scale glyphs to visually present their structures and attribute features.In addition,multiple coordinated views,a series of visual cues and interactions are designed to enable users to select those clusters of interest and gain deeper insights.We further demonstrate the effectiveness and usefulness of our system through case studies and expert interviews based on a real-world dataset.Secondly,we propose an attribute-structure synchronization method for credible graph matching based on Canonical Correlation Analysis(CCA).First,a graph representation learning method graph2 vec is utilized to transform graphs into high-dimensional vectors with structures represented efficiently.Then,we utilize CCA to combine the learned structures and multiple attributes into a synthetical embedding space,in which features consisting in the structures and attributes are well retained with the maximization of mutual information between structures and attributes.Further,we define a distance-based graph matching scheme to quickly retrieve those graphs sharing similar structures and attributes in the embedding space.A rich set of visual interfaces and user-friendly interactions are provided,enabling users to evaluate and compare graph matching results in an exploratory way.Case studies and quantitative comparisons based on real-world datasets have demonstrated the effectiveness of our method in graph matching and large graph exploration.To sum up,in view of the heterogeneous lineages and multivariate attributes of large-scale genealogy,this study utilizes the feature fusion model to achieve visual analysis of genealogical clustering and matching.It not only displays multiple features of genealogy but also provides a set of interactions to enable users to explore and gain insights into the intrinsic correlation and hidden information,which is of great significance to genealogy mining and application. |