Principle Based On Compressed Full-text Search Method | | Posted on:2005-01-21 | Degree:Master | Type:Thesis | | Country:China | Candidate:X J Lian | Full Text:PDF | | GTID:2208360125467130 | Subject:Computer application technology | | Abstract/Summary: | PDF Full Text Request | | New challenger to traditional information retrieval (IR) occurs with the great increment of text information recently. Most of the Information we can get saves in all kinds of documents. In the process of IR, How to compare the similarity among the documents becomes one of the most crucial factors. The traditional method to calculate similarity between texts is to use cosine coefficient in the vector space.We summarize another method using the theory of data compression to calculate compression ratio to express the similarity between texts on the base of previous research. It has some advantages over the other method that is based on the statistic. This method can incarnate the latent characteristic of statistic. And it is independent of key words.In addition, we cluster associated documents. Cluster-based retrieval has as its foundation the cluster hypothesis, which states that closely associated documents tend to be relevant to the same requests. Clustering picks out closely associated documents and groups them together into one cluster. And we use Genetic Algorithm to search associated documents. The result shows us the method' s rationality and va] idity. | | Keywords/Search Tags: | Text Information Retrieval, Data Compression, Similarity, Cluster-based Retrieval, Genetic Algorithm | PDF Full Text Request | Related items |
| |
|