Font Size: a A A

Answer Selection For Non-factoid Question

Posted on:2014-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z H TianFull Text:PDF
GTID:2268330422950614Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of question answering communities, more and more usergenerated content has been accumulated. These user generated content has not only thehuge amount and great varieties, but also has high qualities and values of reusing. Inorder to manage and utilize these resources, researchers has been done a lot of studiesand works these years, and the community question answering is one of those areas thatattracted most attentions.Community question answering (cQA) is based on the data of question answeringsites which is quite different from traditional question answering system. Traditionalquestion answering is focusing question understanding and answering extraction tosolve those factoid questions whose answers are mainly phrases and named entities.While, cQA doesn’t have those constrains on question types, and it has great advantageof solving those question which asks for advices and opinions. Studies on cQA covermany areas such as question search and routing, question interesting, question andanswer quality, answer ranking, user expertize. What’s more, question search andanswer selection as the key component of cQA have drawn much attention of bothacademia and industry.The main work of this paper is building a cQA system based on huge amount ofquestion answering data and developing and devising methods on questionunderstanding, question search and answer selection.When building the cQA system, this paper collected over130million questions and1billion answers for Yahoo! Answers and other question answering site. The size of datais much larger than any of previous studies on cQA which shows the efficient andpractical of my method. Based on these data, this paper applied an automatic method ofclassifying question query into different categories to improve the efficiency and effect.In question search, this paper proposed a way of using learning to rank algorithm tocombine different levels of structural and semantic features extracted from questionquery and questions, which aims to solve the term mismatch problem between questionquery and question. The experiment shows that the ranking model trained by RankingSVM is better than baseline methods on different dataset in evaluation metrics ofprecision and so on.After getting relevant questions of question query from question search, this paperdevised a new unsupervised method for detecting low quality answers by usingcontent-based features. The method is based on three assumptions:(1) most answersunder the question are normal and only a few of them are low quality ones.(2) Lowquality answers can be detected by check these peer answers under the same question.(3) Different question should have different criterion on answer quality. Based on the assumption, this paper used method to minimize the data variance of answers featurevectors and keep the most number of answers at the same time. The experiment showthe method improved the ROC result of baseline methods.After filtering low quality answers, this paper also applied a Ranking SVMalgorithm to rank the answers by using content and user expertize features. Byevaluating300high frequency question queries from query logs of commercial searchengine, this paper got a78%accuracy of answering the question query. After all aboveprocedures, this paper built an efficient and effect cQA system which can gives ananswer in2seconds for any query.
Keywords/Search Tags:community question answering, question search, answer quality, Ranking SVM
PDF Full Text Request
Related items