Research On Question Answering Based On Open-domain Tabular Data

Posted on:2024-05-27

Degree:Master

Type:Thesis

Country:China

Candidate:K L Peng

Full Text:PDF

GTID:2568307052995699

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

Nowadays the research in Natural Language Processing mainly focus on natural text.However,with the exponential growth of Internet data,the tabular data,especially those with linked text,becomes a source of knowledge which cannot be ignored.How to re-trieve these tables according to questions and realize multi-hop question answering both on tabular data and text data has become a problem worth studying.First of all,to solve the problem of open-domain table retrieval,this paper proposes a table retrieval method Fusion Searcher,which combines conventional information retrieval method and neural network methods.It can take into account both high computing speed and high retrieval performance at the same time,and achieve fast retrieving in large-scale table dataset.At the same time,it provides an approach for fusing tabular data and text data,which effectively solves the problem of heterogeneous data alignment and enhances the association between table and text content.Experiments show computational efficiency and semantic matching performance of this method.In addition,to solve the problems of table-text heterogeneous data encoding and multi-hop reasoning,this paper proposes a method for multi-hop question answering model construction based on tabular-text mixed data.It filters out the content irrelevant to the question through text-filtering module and table row content filtering module,which are obtained by weakly supervised learning.And it achieves answer extraction for multi-hop question through the fusion-data extractive question-answering module.Experiments show the effectiveness of this method and all sub module.At last,considering the differences between features of tabular data and natural text data,this paper proposes two pretraining methods for table encoding based on Curricu-lum Learning.They can learn the general comprehension ability for tabular-text data ac-cording to the differences with model pretrained on natural text.Experiments show the performance improvement for these two methods on downstream tasks of tabular data processing.

Keywords/Search Tags:

Table Encoding, Table Retrieval, Pre-trained Language Model, Multi-hop Reasoning, Heterogeneous Data

PDF Full Text Request

Related items

1	A Technology Of Generating SQL Through Chinese Natural Language Queries Based On Deep Learning
2	Technical Research On Chinese NL2SQL Task Based On BERT
3	Research And Implementation Of Fact Verification For Tabular Data
4	Research On Few-shot Text Generation With Pre-trained Language Model
5	Study On Multi-Tenant Data Storage And Data Migration On Basic-Table Combined With Extension-Table Schema
6	Heterogeneous Data Integration Based On Web Services
7	Study Of Case Based Reasoning And Application In Runout Table Cooling Process
8	Table Recognition Based On Digital Image Processing
9	A Study On Key Technologies For Generating SQL Statements Based On Deep Learning In Natural Language Processing
10	Research On Data Migration Algorithm Based On Heterogeneous Table Structures