Font Size: a A A

Research On Deep Learning-based Methods For Question Answering Over Structured Data

Posted on:2021-04-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:S G ZhuFull Text:PDF
GTID:1368330605981216Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of communication technology and the improvement of computer software and hardware performance,more and more people use the Internet to release information,thus leaving a large amount of data.Part of the data is structured,like knowledge graphs and tables.Structured data is constructed according to a data model through both automatic and manual procedures.It is abundant and high-quality,thus is very suitable to be the knowledge source of question answering systems.A main way to answer questions using structured data is by semantic parsing,which maps natural language questions to structured queries like SPARQL and SQL with the same meaning.Traditional semantic parsing methods are based on templates and can only cover a part of query structures.Grammar or syntactic based methods need to manually label the combining or mapping rules,and often encounter the structural mismatch problem when facing nonstandard questions.Machine learning based methods can automatically learn mapping rules using labeled question-query or question-answer pairs,but miss some key information with feature engineering.In recent years,methods based on deep learning have achieved good performance in many natural language processing tasks.Deep learning can automatically extract the best features for matching questions and queries and generalize better on new question patterns or query structures,which is very helpful for question answering over structured data.Although researchers have made some achievements in developing deep learning based methods for question answering over structured data,there are still some problems in query representation and question-query matching.Therefore,based on deep learning,this thesis makes an in-depth study on question answering over knowledge graphs and tables,and achieves the following innovative results:(1)a method of jointly generating,copying and paraphrasing is proposed for simple question answering over knowledge graphs.Existing deep learning based methods do not incorporate the surface-level,semantic-level and out-of-training-data correlations between questions and queries at the same time,nor can they combine them reasonably.This ignores some matching information and leads to a decrease in answering accuracy.To solve the above problems,three decoding modes are joined to together reveal three levels of correlations between questions and queries.Overall,a sequence-to-sequence matching architecture from query to question is adopted.During question decoding,it is observed that the words in specific positions of questions often appear in the names of entities or relations from queries.Therefore,we argue that a question is formed by mixing the literal content and semantic information of a query.Two decoding modes,generating and copying,are proposed which respectively select words from a fixed vocabulary and the source.Additionally,the training data cannot cover many correlations between questions and queries,so we propose paraphrasing mode,which introduces entity aliases and relation expressions that are automatically mined from external data.In order to combine the next-word probabilities formed by the three decoding modes,a gating function is designed to decide the contribution of each mode based on previously observed question words.In order to improve the model’s discriminating ability during training,negative samples whose answers overlap with the correct answer are sampled from the candidate query,and a margin-based objective function is designed to incorporate such samples into optimization.Experiments show that our method of jointly generating,copying and paraphrasing improves the accuracy of simple question answering over knowledge graphs.(2)a tree-to-sequence learning method is proposed for complex question answering over knowledge graphs.Existing deep learning based methods do not model structures of complex queries,which makes their representations lose part of the semantics and lead to a decrease in answering accuracy.To solve the above problems,we encode the entities,relations and their joining orders in a query together.Overall,a matching architecture from query to question is adopted.During candidate query generation,to expand the search scope,we propose to first link the question to possible entities,types and number operations,then build connected graphs from them as candidate queries.During query encoding,to jointly encode entity,relation and structure information,the query is regarded as a tree and a tree-based long short-term memory encoder is proposed.During question decoding,the decoder proposed in the previous work is improved by merging the copying and paraphrasing modes into a reference mode,which estimates the probability of continuous words expressing an entity or relation using a language model.Experiments show that the propsed tree-to-sequence learning method improves the accuracy of complex question answering over knowledge graphs.(3)a method of decoupling and grouping actions is proposed for sequential question answering over tables.Existing deep learning based methods often use an action sequence to represent SQL statements,which guarantees that the generated content is syntactically correct and the parameters are from the table.However,such methods combine actions that need different types of information,but do not use a same neural network for actions that need the same type of information,which affects feature extraction and lead to a decrease in answering accuracy.To solve the above problems,we redesign the action space according to the characteristics of sequential question answering.Overall,an action generation model is adopted.During question and table encoding,to help understand information replication from previous questions to subsequent questions,a pre-trained language model is introduced to help detect ellipsis and reference phenomenons.During action decoding,the action space is redesigned to decouple unrelated actions which are combined in existing methods,and a same neural network is adopted to extract features for related actions.To enlarge the scope of answerable questions,the "equal"action is extended to "like",which enables multiple cells in the same column to be selected based on string matching.Experiments show that our method of decoupling and grouping actions improves the accuracy of sequential question answering over tables.
Keywords/Search Tags:question answering method, structured data, knowledge graph, table, deep learning
PDF Full Text Request
Related items