| As the development of information technology,artificial intelligence technology and Internet technology,the technology of human interaction with intelligent system has been constantly improved.The Internet contains a huge amount of knowledge and information,and the related intelligent applications have developed rapidly.The field of natural language processing used in intelligent systems usually needs to process the knowledge information contained in a large amount of corpus text.Finding a better way to realize intelligent human-computer interaction system based on a large amount of knowledge and information is of great research and application value.The purpose of this thesis is to research and construct an intelligent question answering system using knowledge graph as knowledge base,based on the idea of information extraction,based on the entity and relationship information contained in the sentence.For the intelligent processing module in the question answering system,this thesis applies the BERT(Bidirectional Encoder Representations from Transformers)model to the named entity recognition subtask and entity relationship extraction subtask to extract relevant information based on the transfer learning method,and obtains better results.The main work of this article is as follows:1.We use a corpus data preprocessing module based on text enhanced filtering method,a user problem entry module based on Web Socket,a named entity recognition module based on the sequence labeling method of BERT transfer model,an entity link module based on synonym corpus,an entity relationship retrieval module based on HBase thrift2 service,an entity relationship extraction module based on sentence pair classification of BERT transfer model,an Top-k answer generation module combining answer ranking and pattern matching.The above seven modules build an open domain knowledge graph intelligent question answering system based on BERT transfer learning.2.Based on the NLPCC2018-KBQA corpus,this thesis proposes and applies a remote supervision Q & A entity labeling algorithm and a knowledge graph-based relationship extraction negative sampling algorithm.The two algorithms automatically construct the supervised samples required by the named entity recognition module and entity relationship extraction module in the intelligent question answering system.3.Traditional named entity recognition tasks are usually implemented based on machine learning algorithms such as recurrent neural networks,convolutional neural networks,attention mechanisms,and conditional random fields.In order to reduce the information loss of the traditional named entity recognition task during the feature engineering phase and the loss of information caused by language embedding learning that is too independent of downstream natural language processing tasks,this thesis analyzes and constructs three transfer models: BERT-Bi LSTM-CRF,BERT-Bi GRU-CRF,and BERT-CRF.Among them,the BERT-Bi LSTM-CRF model achieved an F1 value of 94.94% on the People's Daily named entity recognition data set,and the BERT-Bi LSTM-CRF model based on the named entity recognition data set constructed by NLPCC2018-KBQA has achieved an F1 value of 94.92%.At the same time,the BERT-CRF transfer model is good at handling name entity recognition tasks,and achieved an F1 value of 96.71% on the named entity of the People's Daily dataset.4.In the traditional entity relationship extraction method,the entity relationship extraction task is usually divided into two stages of named entity recognition and relationship extraction,resulting in information loss and error propagation between the two processing stages.Based on the BERT transfer model,this article directly applies natural language question sentences and relationship names to form sentence-pair relationships,and trains the entity relationship extraction task model based on the constructed sentence pair,and then uses an answer generation algorithm based on the combination of answer ranking and pattern matching to generate a Top-k answer.And obtained a Top-5 value of 97.68% on the data set constructed on NLPCC2018-KBQA. |