Font Size: a A A

A Techniques Research On Open-Domain Question Answering Systems Using Unstructured Documents

Posted on:2018-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:C XuFull Text:PDF
GTID:2348330515959745Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The Question Answering Systems are concerned with providing relevant answers in response to questions proposed in natural language.This paper is about Question Answering Systems based on open-domain and unstructured documents,whose dataset is unstructured documents and can deal with questions about nearly everything.Classic Question Answering Systems based on open-domain and unstructured documents usually composed of three distinct modules.These three components are:question processing module,document processing module and answer processing module.There are two problems.Firstly,the scale of candidate paragraphs returned by documents processing module is too big.Secondly,heuristic measures to extract the answer candidates are too complex.To deal with the first problem,we use sentences filtering module and sentences ranking module to reduce candidate paragraphs to single sentence.To deal with the second problem,we use an end-to-end neural network to replace heuristic measures to extract the answer candidates.We improve a distance function between text documents named Word Mover’s Distance(WMD)and present a novel model compounded of BM25 and WMD in sentence processing module.Experiments on document classification and document ranking task demonstrate the superiority of our new model.In the process of sentence ranking,we design Multiple Level Feature Rank(MLFR)model consists of five features at different levels of granularity to measure the relevance between questions and candidate sentences.Experiments on sentences ranking task demonstrate the superiority of MLFR model.Finally,we use an end-to-end neural network to replace heuristic measures to extract the answer candidate.We combine sentence filtering module,sentence ranking module and answer extraction module.Experiments are designed to test system performance.This paper present two resolutions to deal with the problems of classic Question Answering Systems based on open-domain and unstructured documents.We improve a distance function between text documents,design Multiple Level Feature Rank(MLFR)model consists of five features at different levels of granularity to measure the relevance between questions and candidate sentences and use an end-to-end neural network to replace heuristic measures to extract the answer candidate.Experiments demonstrate the superiority of our system.
Keywords/Search Tags:Unstructured documents, Question answering, Sentences filtering, Sentences ranking, Neural network
PDF Full Text Request
Related items