Font Size: a A A

Research On Information Retrieval Based On Language Model And Reranking For Retrieval Results

Posted on:2007-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:X G HuFull Text:PDF
GTID:2178360185985855Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Information Retrieval (IR) deals with the representation, storage, organization of, and access to information items. Sentence retrieval plays an important role in question answering, machine translation, summarization and more research fields. Retrieval models are the kernel of information retrieval. Different retrieval models result in different similarity calculating methods and also different retrieval results, so the research and improvement on retrieval models have significance in information retrieval. Our improvements and experiments suggest that variant factors are integral parts of SLM-based retrieval models and not used heuristically as in traditional retrieval models, especially the structural information and semantic information of natural language.In this paper, 863 testing for Information Retrieval 2005 is an important background, so the technology used in the information retrieval system for 863 testing is introduced first, just like hyper text content extraction, word segmentation, global text indexing, and query generation, which are basis of combination of multiple models research.VSM is a classic model which contributes in many applications of retrieving nowadays. However, as it's known that VSM is mainly empirical and superficial in text level. To retrieve information with more knowledge of language itself, statistical languages model for information retrieval was proposed a few years ago and develops fast. One of research topics focus on language model for document retrieval and comparison with classic VSM on a Chinese test-set.Comparing Word Sense based Language Model to other language models, the experiments based on the corpus of TREC shows that WSLM method has a better performance than the traditional td-idf method. If a more powerful Word Sense Disambiguation tool is used, the result could be improved.In the retrieval results re-ranking part, we use a combination method which combines the retrieval results of different system by linear interpolation. Experiments suggest that variant factors are integral parts in combination of multiple models. IT can be the more powerful tools for Chinese information retrieval, and provide people with more and better information service.
Keywords/Search Tags:Information Retrieval, Retrieval Model, Statistical Language Model, Combination of Multiple Models
PDF Full Text Request
Related items