An Open-Domain QA System Based On Heterogeneous Dense Representations

Posted on:2023-01-11

Degree:Master

Type:Thesis

Country:China

Candidate:T Y Zhou

Full Text:PDF

GTID:2558306914477114

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

The research topic of this paper is the design of an open-domain question answering system based on heterogeneous dense vector representations.An open-domain question answering system usually consists of three basic modules:retrieval,reranking,and reading comprehension.This paper will discuss and practice the design and training of the retrieval module,and the training and light-weighting of the reranking module,respectively.The role of the retrieval module is to filter out the documents that are most likely to help answer the user’s question from a large-scale document set.Rule-based retrieval methods only focus on the overlap between texts,while neural network-based retrieval methods usually only consider semantic matching or contextual relevance.In order to achieve more accurate text retrieval,this paper proposes to integrate three text relevance features:text overlap,semantic matching and context relevance(or topic consistency),so as to achieve a quantification of relevance between user questions and documents.On this basis,a unique extraction method is designed for these three correlation features without changing the encoder architecture,so that the dense vector representation of these three heterogeneous features can be realized in the neural network model,and finally a fusion representation of the three is achieved.The role of the reranking module is to further determine the supporting documents needed to answer the question from the candidate documents retrieved by the retrieval module.Although both reranking and retrieval module essentially score documents through feature extraction,they are related to two completely different aspects at the linguistic level.The task of the retrieval module is to retrieve documents relevant to the question as much as possible without considering whether these documents are sufficient to support the answer to the question.In order to ensure the consistency of the training process and the inference process,this paper proposes to further construct negative samples based on the candidate documents output by the retrieval module on the existing supervised dataset,so as to ensure that the training data of the reranking model is sufficient to guide the model to learn how to identify documents.Answerability to the question.Finally,this paper attempts to distill the trained reranking model to fit civilian-grade devices.By using the fine-tuned BERT to perform distilled learning on TextCNN,the candidate document capacity of the reranking model is effectively increased,thereby significantly improving the speed of the inference process and the memory usage without losing too much performance.The document retrieval and document reranking method proposed in this paper has achieved significant overall performance improvement on multiple mainstream question answering datasets,and the lightweight reranking method significantly reduces the computational resource consumption of the model.

Keywords/Search Tags:

open-domain question answering, document retrieval, document reranking, negative sampling

PDF Full Text Request

Related items

1	Design And Implementation Of Retrieval Open Domain Question Answering System Based On Tarles
2	Research On Key Technologies Of Open-Domain Question Answering Based On Textual Knowledge
3	Research On Key Techniques Of Question Understanding For Open-domain Question Answering System
4	Research Of Chinese Information Retrieval System And Document Reranking
5	Research On Open-domain Question Answering Method Based On Text
6	Research On Deep Learning-based Multi-document Passage Ranking Methods For Question Answering System
7	Research On Key Technologies Of Single Data Source Open Domain Question Answering System
8	Research On Open Domain Question Answering Technology Based On Deep Neural Network And Weakly Supervised Learning
9	Research On Open-Domain Question Answering System
10	Research Of Specific Domain Question Answering System Based On Internet Information