| Recently, the widespread adoption of XML holds out the promise that XML has become a standard for data representation and exchange over the Internet, and the number of XML documents increases exponentially. Keyword search is an effective paradigm for information discovery and has been introduced to query XML documents. Recent studies are focusing on proximate keyword search, however due to the ambiguity and inaccuracy of keywords, it is difficult to describe the users’query intention, and the results obtained are far from satisfactory. Performing keyword searches over XML documents has the possibility to return massive result, results clustering is an important method to return high quality results. There exist many candidate results matching queries within clustered results, ranking mechanism within clusters is another important method to achieve highly effective keyword search.In this paper, surrounded by the presence of "returning meaningful clustered results", in view of results clustering and ranking within clustered results, we investigate the problem of XML keyword search and represent the mathematic model and algorithmic analysis. So, we present a multi-granularities features computing method. In this method, we first propose the similarity measure of Cluster Compactness Granularity (CCG) to partition search results into different clusters, which forms many cluster collections related to search intension. Furthermore, we propose the similarity measure of Subtree Compactness Granularity (SCG) to rank candidate matching subtrees within clusters, which is different from the traditional search engine. Moreover, we define a novel semantics of Compact LCA (CLCA), which resolves the problem of identifying the relevant matches by eliminating redundant LCA nodes, also solves the problems of identifying relevant non-matches by deducting nodes that filtered out wrongly, and overcomes the shielding effects and isolation based on SLCA approaches.We implement an efficient graph-based algorithm called XEdge, which integrates the CCG and SCG features with the CLCA semantics. Comparing with the existing methods such as XSeek and XKLUSTER in the view of cluster quantities, precision and recall, the experimental results demonstrate that XEdge can produce higher quality XML clustered results, and have better retrieve performance. |