Font Size: a A A

Query Analysis For Information Retrieval

Posted on:2007-04-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:W X XiongFull Text:PDF
GTID:1115360185968407Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
The analysis of natural language query expressions is often neglected in the community of information retrieval. It's a common practice that all word segments in a user's request sentence are utilized directly as search terms. The lack of thorough query analysis leads to a poor grasp of users' particular information needs. Thus, the back-end system will not be fed with good inputs, without which all those complicated algorithms, such as indexing, matching, sorting and ranking proved to be not working well. The dissertation focuses on the NLP application in query analysis to achieve a better understanding of the users' information needs, and to improve both precision and recall for an information retrieval system.The main topics described in the dissertation are as follows:(1) Information content words and stop words are discriminated among query words. By removing those trivial non-content-bearing words, search terms suitable for information system will be refined. Query is a controlled language in essence, expressing the requests of information acquisition, and have some typical restricted patterns. With this observation in mind, we draw a distinction between the general stop words and query-specific stop words, by illustrating their different distribution features in various corpora, and their functions in the information expressions. We also give an approach for building a stoplist based on entropy and Kullback Leibler divergence and a dynamic probabilistic recognition method based on N-gram and position information. Experiments show our proposal is superior to the baseline, which only constructs a static stoplist. The research is conducted on the corpus analysis of 200,000 query expressions.(2) Concept salience is put forward for treating a specific retrieval in query pre-processing. It is demonstrated that, when users want to satisfy a particular information need, they tend to use various expressions. Our work is then to distinguish dominated concepts from the excluded ones in the user's query. According to whether they will occur in target texts, the concepts in a query are classified into...
Keywords/Search Tags:Query Analysis, Information Retrieval, Information Need, Information Content Word, Stoplist, Generic Word, Detailed Information, Concept Salience
PDF Full Text Request
Related items