Font Size: a A A

Research On Question Answering Based On Open-domain Tabular Data

Posted on:2024-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:K L PengFull Text:PDF
GTID:2568307052995699Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Nowadays the research in Natural Language Processing mainly focus on natural text.However,with the exponential growth of Internet data,the tabular data,especially those with linked text,becomes a source of knowledge which cannot be ignored.How to re-trieve these tables according to questions and realize multi-hop question answering both on tabular data and text data has become a problem worth studying.First of all,to solve the problem of open-domain table retrieval,this paper proposes a table retrieval method Fusion Searcher,which combines conventional information retrieval method and neural network methods.It can take into account both high computing speed and high retrieval performance at the same time,and achieve fast retrieving in large-scale table dataset.At the same time,it provides an approach for fusing tabular data and text data,which effectively solves the problem of heterogeneous data alignment and enhances the association between table and text content.Experiments show computational efficiency and semantic matching performance of this method.In addition,to solve the problems of table-text heterogeneous data encoding and multi-hop reasoning,this paper proposes a method for multi-hop question answering model construction based on tabular-text mixed data.It filters out the content irrelevant to the question through text-filtering module and table row content filtering module,which are obtained by weakly supervised learning.And it achieves answer extraction for multi-hop question through the fusion-data extractive question-answering module.Experiments show the effectiveness of this method and all sub module.At last,considering the differences between features of tabular data and natural text data,this paper proposes two pretraining methods for table encoding based on Curricu-lum Learning.They can learn the general comprehension ability for tabular-text data ac-cording to the differences with model pretrained on natural text.Experiments show the performance improvement for these two methods on downstream tasks of tabular data processing.
Keywords/Search Tags:Table Encoding, Table Retrieval, Pre-trained Language Model, Multi-hop Reasoning, Heterogeneous Data
PDF Full Text Request
Related items