Font Size: a A A

Question Retrieval System Based On Question And Answer Pairs

Posted on:2011-10-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:J Z WangFull Text:PDF
GTID:1118360305992037Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
For the shortage of web search engines, question and answer Service communities have advanced remarkably as a beneficial supplement to web search engines. In Q&A service communities, user describes his/her question by natural words, and gets answers from other users. There is no need to search answers from mass of answers returned by web search engines. Now question and answer Service communities such as Yahoo! Answer, Sina iAsk, and Baidu Zhidao, have accumulated a large Q&A archives. Take Baidu Zhidao for instance, it has piled up more than 70 million Chinese Q&A pairs.This article focused on how to form a Question Retrieval System based on large scare of Q&A pairs accumulated in question and answer service communities. The main purpose of this system is:Searches synonymous Q&A data in existing Q&A corpus according to user's question, and sends answer to users to meet their information needs. This system avoids the inconvenience caused by searching from various WebPages and waiting for other users to provide answers.This article also includes a series researches based on Question Retrieval System builds on large scare of Q&A corpus. Firstly, this paper deals with question classification task in Question Retrieval System to ensure user's information need, and improve user's experience. At the same time, it also analyzed subjects of users'question, so as to make sure users'information need, and automatically provides identical or close Q&A pairs to users to satisfy their need. Meanwhile, considering the large amount of Q&A sources not contained in Q&A service community, even not form any web pages, this paper talks about how to analyze the textual chat data produced in group discussion, and refine the Q&A data used for expanding Q&A corpus of Question Retrieval System.1. In Question Retrieval System, question classification is one of the crucial major tasks and it is important for organizing the questions. Based on the kullback-Leibler distance classification algorithm, this paper introduces a new question classification approach adopting the idea of language model, named n-gram KLD. The experiment results with a large corpus which contains more than 1 million question-answer pairs show the superiority of n-gram KLD over transitional algorithm. 2. Question retrieval is the major tasks in Question Retrieval System. It means finding questions in the archive that are semantically similar with a user's question. This enables high quality answers from the archive to be retrieved and removes the time lag associated with a community-based system. In this paper, we discuss how to find similar questions based on their topics. The experiment results show that with our approach it is possible to find topic relevant questions. And our approach outperforms traditional approaches3. According to Baidu's "dark net" plan, less than 0.2% of the information is available on the Internet. Large amount of human society data cannot be retrieved by search engines. This paper concerns about the text chatting data generated by the discussion group. Text chat data contains a lot of available information, which often constitutes different threads, each thread is about a topic of useful Q&A resources, but has not yet been well managed and excavated. In this paper, by considering the content of chat messages in the data and context information, and using the idea of statistical translation model, we tap the message and the topic of implicit semantic association between the leads, according to different topic threads. It is useful Q&A resource. Though the experiment on real data set, we prove that the method proposed in this paper is effective. This approach has helped to find Q&A resources from the discussion group text chat data, and further expand the retrieval system's corpus scale for the Question Retrieval System.
Keywords/Search Tags:Question Retrieval System, Language Modeling, Short Text Classification, Question Subject Analysis, Thread Detection
PDF Full Text Request
Related items