Font Size: a A A

Research And Implementations Of A Search Model Based On The Fusion Of DB And IR

Posted on:2011-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y F SongFull Text:PDF
GTID:2218330368986316Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of internet and information system, more and more databases have been used to store data. Most of these databases need to supply a search interface. But the original databases are designed for programmers and advanced users. Users need to be familiar with query language such as SQL and understand the schema of database, which is very difficult for common users. At the same time, because lacking of ranking ability, users can't retrieval needed information possibly. The technology of information retrieval can fit these problems suitably. So how to integrate DB and IR technologies and supply the common users an easy and efficient query system is a common requirement. Keyword searching over database is the production under this situation.There are two main kinds of algorithms of database retrieval:data graph algorithm and schema graph algorithm. Differences between the two kinds of algorithms are huge. Because schema graph algorithm doesn't need to store the relationship between tuples which are stored in database, it has a lower expense of system costing. When the content of database is changed, schema graph can fit the change more rapidly, and display it in the result. According to that, schema graph is one of fashionable algorithms of database retrieval. But there are still low effectiveness and efficiency of retrieval in current schema graph algorithm.In order to solve these problems, we propose an improved algorithm in this paper. The improved algorithm consists of four parts:pre-execution of database, construction of limited query patterns, generation of candidate network, and execution of candidate network. The pre-execution of database generates pre-patterns of querying and meta-candidate network based on keywords which exist in database, and updates the database according to different conditions. Construction of limited query patterns combines users'keywords and declines the repeated tuples, and then constructs the limited query patterns. The size of tuple sets can be decreased in this way. Generation of candidate network compares the users'query patterns and pre-patterns of querying, if there are pre-patterns existing, then pick up the corresponding candidate network, else generates the candidate network by using the query patterns and schema graph of database. Execution of candidate network generates joining tree of tuples and calculates its score by a cosine equation, and then translates it into SQL and executes it. Finally, we design and carry out a system based on the improved algorithm, and compares with the system based on original algorithm in this paper. The experiment uses two sizes of datasets, and uses the average precision and recall to judge the effectiveness of system. The efficiency of system is compared by using different conditions of searching.As a result, the experiment shows that system based on improved algorithm has a better effectiveness and efficiency of retrieval.
Keywords/Search Tags:Schema Graph, Candidate Network, Top-k, Database Retrieval, Limited Tuple Sets, Query Optimize, Keywords, Ranking
PDF Full Text Request
Related items