Font Size: a A A

Frequent Subgraph Mining On Single Large Uncertain Graphs

Posted on:2016-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ChenFull Text:PDF
GTID:2346330536967723Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Military intelligence analysis and processing sub-system plays an important role in a military information system.With the evolvment of various intelligence acquisition technologies,military intelligence data takes on characteristics like heterogeneity and unstructureness,and hence,brings great technological challeges to its analysis and processing.Massive text analytics is the most important and rudimentary part of military intelligence analysis and processing.Currently,an effective and holistic approach for processing textual intelligence is to construct a document network,that is,a single large graph data,by taking documents as nodes,mutual relationships as edges,and then,a series of analytics and judgements are conducted.Taking textual intelligence analytics as the background,this thesis focuses on mining single large graphs(networks).Owing to noise,measurement error,privacy and imperfection,etc.,uncertainty widely exists in the real life.As a data model with wide spectrum of modeling ability,graph data also occupies with uncertainty.Besides the aforementioned document network,ucertain graphs are frequently seen in domains including bio-informatics and social network.Thus,research of uncertain graphs becomes a focused theme.In this thesis,we model texutual military intelligence by uncertain graphs,and consider mining frequent patterns on the established graphs.To the best of our knowledge,mining frequent subgraph on single large uncertain graphs has not been previously investigated.Firstly,we give the definition of support on single uncertain graphs under expected semantic,and propose an enumeration-evaluation framework.The enumeration strategy is similar to that on the deterministic graph,and we focus on the evaluation procedure.Secondly,by proving the #P-hardness of computing expected support on single uncertain graphs,we thus develop an approximate algorithm with precision guarantee,to meet the needs of practical application.Thridly,to further improve the efficiency,two optimization strategies are proposed.One considers the computation sharing among sample graphs,while the other incorporates checkpoint mechanism and a structure-based upper bound to prune infrequent patterns as early as possbile.Better performance is achieved by the two optimization strategies.Lastly,experiments on real datasets reveal the availability and effectiveness of the mining technology.To further verify the pragmaticity of the proposed technology,we apply it to reallife textual intelligence data.Considering documents as nodes,the similarity between documents as edges,we construct the association among documents.We label the documents through LDA model,and gauge the similarity between documents via a knowledge base based algorithm.Two nodes are connected by an edge when their similarity is no less than a given threshold,and the similarity is normalized as the edge probability.On the document network,the proposed uncertain graph mining technology is applied.Empirical results reveals that the discovered patterns are explanable,and the proposed technology promises good practical applications.
Keywords/Search Tags:Single, Large, Uncertain graph, Frequent Pattern Mining, Document Analysis
PDF Full Text Request
Related items