| With the rapid development of information technology,a large number of data with massive entities and relationships are generated.Information network is widely used for modeling data with associated semantics,where nodes are used to represent entities and edges are used to represent relationships between entities.Entities and relationships can be homogeneous or heterogeneous.Heterogeneous information networks can contain richer semantic information.Multi-dimensional analysis is an effective way for potential useful information discovery.This thesis focuses on the development of a multi-dimensional analysis framework on large-scale heterogeneous information networks,the main contributions are listed as follows:1.A graph cube model with two layers,named as HiCube,is proposed.First,the structure dimension and the attribute dimension are used as the dimensions of heterogeneous information network multidimensional analysis;Second,the structure and attribute dimension aggregate graph are proposed as the analysis metrics;Then a two-layer graph cube model is proposed to manage the aggregate graphs.The first layer is the structure dimension cube,the second layer is the attribute dimension cube,and the hierarchical structure is defined specifically as well.Based on the proposed cube model,the roll-up and drill-down operations are proposed to support users’ multi-dimensional and multi-level analysis requirements.2.The materialization strategies and query algorithms are proposed based on HiCube.Different materialization strategies are proposed for the structure cube and the attribute dimension cube to search for a better tradeoff between query time and space storage.Then a query algorithm is designed and implemented to search and calculate the materialized cubes.The effectiveness and efficiency of the algorithms are verified by experiments.3.In order to support multi-dimensional query and analysis on large-scale heterogeneous information network in real applications,a prototype system is designed and implemented based on HDFS and Spark.The system provides a user-friendly aggregate graph visualization tool as well.The effectiveness of the system is verified by the multidimensional analysis test on real information networks. |