Font Size: a A A

Research On Tibetan Question Answering System Based On Deep Learning

Posted on:2020-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:T C XiaFull Text:PDF
GTID:2435330575496416Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The goal of question answering system is to enable machines to understand human questions in the form of natural language and return an accurate and concise answer.In recent years,the rapid development of knowledge base has provided abundant and convenient resource support for the research of question answering system.Therefore,the question answering over knowledge base has attracted more and more attention from industry and academia.At present,question answering over knowledge base has made remarkable achievements in the field of Chinese and English,and more and more researchers are involved in it.However,the research of Tibetan question answering system is still in the exploratory stage,especially the research of Tibetan question answering over knowledge base,there are many challenges.Firstly,Tibetan is a low-resource language.Compared with Chinese and English,Tibetan question answer corpus is scarce.Secondly,the data of Tibetan knowledge base is sparse,so the traditional representation method can not express and learn the knowledge base well.Finally,in the case of the small scale of Tibetan question answering corpus,how to use deep learning to extract the features of Tibetan question answering and return the answers is an important research content of Tibetan question answer corpus.To solve the above problems,this paper studies the Tibetan question answering over knowledge base.The main work is as follows:(1)In order to automatically expand the Tibetan question answering corpus,two methods are proposed to construct the question answering corpus:graph-based semi-supervised model algorithm and QuGAN model algorithm based on deep learning.Graph-based semi-supervised model is mainly based on the Tibetan knowledge base,which combines the knowledge base model with the entity(relationship)type to automatically construct the Tibetan question template and the factual question.The QuGAN model based on deep learning initializes random questions by maximum likelihood estimation,and then sends them into quasi-cyclic neural networks to generate virtual questions.At the same time,we optimize the reinforcement learning Monte Carlo search strategy and automatically adjust the question structure.Finally,the BERT language model is used to fine-tune the questions output by the generator,which makes the virtual questions more natural and accurate.Compared with the SeqGAN model,this model improves BLEU-2 by 6.7%.(2)In view of the sparse data in Tibetan knowledge base,this paper proposes an EJKB model based on deep learning.The model extracts the keyword information which is highly correlated with entities in Tibetan Encyclopedia text,and uses the co-occurrence matrix of convolutional neural network learning entities to describe entities.At the same time,it uses TransE algorithm to express entities structurally.Finally,the descriptive representation and structured representation are joined together to obtain the final representation of triples in the knowledge base.Compared with TransE algorithm,the model improves 2.08%on Mainland,and the F1 value in knowledge base QA reaches to 0.82.(3)According to the grammatical characteristics of Tibetan,a hybrid neural network model is proposed to extract the features of question sentences in the case of a small Tibetan question answering corpus.Firstly,convolutional neural network is used to extract local features of questions.Secondly,long-term and short-term memory network is used to extract deep features of questions,and attention mechanism is integrated to extract deep semantic features of questions.In the part of answer return,the BERT language model is used to decode the eigenvector of the question,and the similarity between the question representation and the knowledge base representation is calculated,and the answer with the highest score is returned.The value of this method ACC@1 is 0.652.
Keywords/Search Tags:Tibetan, Knowledge Base, Question Answering, Deep Learning, Language Model
PDF Full Text Request
Related items