Font Size: a A A

Research On The Synthetical Information Integrate And Query Optimization

Posted on:2007-09-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:H YuFull Text:PDF
GTID:1118360212457655Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Internet, all sorts of online information sources are increasing rapidly. The type and structure of these information sources are different. The structure and content of the information sources keep changing. It will be very difficult to retrieve information from the dynamic, heterogeneous, open circumstance. It is very significative to research the information integrated system and provide a uniform interface for the users. Under such circumstances, the main techniques of information integration are discussed. The key contributions are as follows.The architecture show the modules of the information integrated system and the relations among the modules. The famous architectures of information integrated system are analyzed. The advantages and disadvantages of the architectures are discussed. On the basis of analyzing the requirement of the actual information integration system and other ralative information integrated system, the architecture of synthetical information integration system is proposed. In the new architecture, both the existent information and the producing information are handled, the data from traditional database and XML Schema are managed, the information retrieving as well as the assistant decision are considered. It can meet the requirement of most enterprises.The creation of the schema mapping is one of the importment steps in information integrated system. The PBMSDF (Eartition Based Mapping Schema Discovery Eramework) is presented to efficiently discover the mapping schema. Dhamankar proposed iMAP frame, in which the set of searcher and Beam Search are applied. 1:1, 1:n and n:1 mappings can be discovered. However, there are some disadvantages in iMAP frame. Firstly, the m:n mapping can't be discovered. Secondly, the attributes and the instance of the attributes must be analyzed to discover the maaping schema so that the cost of the system is expensive. Finally, geeting the instance of attributes is impossible for Web information integration. Therefore, the iMAP frame can't be used in Web information integrated. He proposed DCM (Dual Correlation Mining) frame, in which the problems of iMAP frame are resolved. However, the Shortcomings of DCM are as follows. On the one hand, the measure of the correlation is inaccurate sometimes so that the the result is unauthentic on occasion. On the other hand, the time cost of AprioriCorrmining and DualCorrelationmining mining algorithms has a lot of unnecessary searching. In this paper, The C-Measure is proposed, the partition and stack based attribute group and mapping schema discovery algorithm are proposed. The former can calculate the correlation among attributes and the latter can reduce the search time. The...
Keywords/Search Tags:Information Integrated, Schema Mapping, Multi-Join Query Optimization, XML Query Optimization, PBMSDF
PDF Full Text Request
Related items