Tree-based Semantic Similarity Of XML Documents

Posted on:2010-05-14

Degree:Master

Type:Thesis

Country:China

Candidate:B T Yang

Full Text:PDF

GTID:2178360302966036

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the XML data representation of the various types of widely used, XML (eXtensible Markup Language, Extensible Markup Language) for its tremendous versatility and flexibility of the enterprise has become the focus of concern, but also for web-based information exchange with to a new hope. It will not only mark unstructured data, you can also mark highly structured data, such as the data in the database. However, XML data is semi-structured, and in the search deal with these semi-structured data, there were more structured for processing XML data query language (XQuery Language, XQL) no longer apply, especially in the user need to find the information relevant to a particular (but not exactly) of data. Therefore, the search heterogeneous XML data, XML documents should also be research is based on the approximate search technology.XML documents usually contain a lot of information entities, and these information entities belong only to a limited number of categories of information structure model. Therefore, in general XML documents may contain redundant information a large number of similar structure. In order to effectively determine the structure of XML document similarity between the measure, we are concerned only with the XML document structure closely linked to information rather than things in the document, said specific information data itself. Determination of the structure of XML document similarity measure between the need to screen and remove the XML document structure is not related to its redundant information to extract the effective structure of XML documents, and work modeling. According to the model and based on XML documents to determine similarity between the structure of the standard XML document can be obtained between the structure of the value of similarity measure. The focus of the entire measurement process include: extracting a valid XML document structure, structural modeling and similarity measure of standard definition.Comparison of structural similarity play a key role in many areas. As an organizing principle of similarity can be used to distinguish between things, the formation of concepts, summed up. Similarity in different abstraction layers of comparison: the data layer, between the type of layer or two layers. Evaluation of the similarity between the data and create with the same clustering related topics. For example, in the image field, the similarity measure can be used in the main group contains the same picture; evaluation of category similarity between the structure and integration, but through various programs described in the same information, but also with the program Cluster-related; evaluation data and type of The similarity between the identification data generator also relevant. In addition, for this cluster, it is concerned about the content or the data structure that contains similarities. In the area of extensible markup language, the evaluation of similarity are more and more attention as more and more web-based information exchange is attached to this form, many of the software need to retrieve, access, handle the conditions and return XML documents approximation. Structure similarity calculation involves two aspects: First, the data source to determine whether the pattern tree contains similar information, including a pre-set similarity threshold; the other is the calculation of the data structure of the tree and model tree similarity. XML document itself is not a simple tree structure, with the capacity of XML describing the continuous expansion of XML documents can not be described as a simple tree structure, but XML document map, so will be used to determine similarity in the structure diagram traversal and the first search methods.XML document query methods become an important issue in data processing, in order to meet the Web-based XML data approximate search, information classification and the needs of data exchange, XML data, the proliferation of data mining for information retrieval intelligent information processing provides the opportunity for and challenges of XML document retrieval similarity calculations are mining and deep-seated intelligent processing based on the similarity calculation study has very important significance. In order to effectively identify structural similarity between XML documents to determine methods, this paper combines the node similarity, distance similarity, and structural similarity to face three XML documents valid analysis. This paper used the data structure of the relevant knowledge, such as the way the node traversal, depth first and breadth-first search method, so you can more accurately determine the structure of XML document similarity measure between.Another recent paper also describes foreign workers in the XML document structure similar measure in research and development of the situation, introducing an effective evaluation of XML document structure similar to the standard measure, and the XML document structure information extraction and modeling, structural similarity measure the standard definitions.

Keywords/Search Tags:

XML Similarity, Depth-First Search, MST

PDF Full Text Request

Related items

1	Research On String Similarity Search Algorithms
2	Similarity Graph-based Scientific Literature Search Key Technology Research
3	Research On Key Technologies Of High Performance Similarity Search Algorithms And Optimization
4	Study On Similarity Search For Textual And Spatial Data
5	Study On Match Similarity Search
6	Research On Similarity Search Based On Hash Function
7	Research On Locality Sensitive Hashing-Based Similarity Search
8	Dynamic Similarity Search Over Encrypted Data
9	Research On Similarity Search Of High Dimensional Data Based On Hash Technique
10	On Similarity Search Among Web Pages