Font Size: a A A

Research On FAQ Question Matching Based On Domain Knowledge Graph

Posted on:2023-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:H M ZhaoFull Text:PDF
GTID:2568307022457384Subject:Software engineering
Abstract/Summary:PDF Full Text Request
FAQ systems work by matching user-asked questions with questions in the question database,finding semantically similar questions,and thus giving answers to user-asked questions.The existing mainstream FAQ systems do not introduce domain knowledge and do not optimize the matching for domain-specific questions.In this paper,we take the operating system domain as an example and study the domain knowledge graph-based FAQ question matching.In the operating system FAQ,the questions asked by users usually contain English abbreviations,many proper nouns and irregular expressions,which affect the effectiveness of question matching effect.In addition,the user’s question is usually related to the operating interface of the operating system,and the performance of question matching can be effectively enhanced by using screenshot information,but the existing FAQ systems do not consider this situation.To address the lack of domain knowledge,this paper constructed an operating system domain knowledge graph in order to enhance the performance of question matching.For the purpose of improving the accuracy of integrated knowledge,this paper proposes a knowledge filtering method.Through the strategy of substring matching and entity multi-labeling,candidate entities are selected from the knowledge graph as much as possible,and the semantic association between the candidate entities and the question is analyzed using a deep learning model to determine the relevant knowledge entities.In order to utilize the knowledge more fully,this paper makes improvements to the pre-training model K-BERT that incorporates knowledge,and proposes the FK-BERT model.Compared with K-BERT,the FK-BERT model considers both the relationship between entities within a single question and the entity association relationship between two questions.In response to the underutilization of image information,this paper extracts information from domain images with a view to fully utilizing the important information contained in the domain images.In order to utilize the large amount of text contained in the operating system screenshot,this paper uses OCR technology to identify the text in the image and calculates the importance of the text blocks in the image by weighting the average of three indicators: whether the text in the image is highlighted,its relative position and the degree of association with the question text,and then filters the important text.In order to input the text in the image together with the original question text into the FK-BERT model,this paper makes the following improvements to the model: in view of the fact that there is no clear syntactic order of different text blocks in the image,the relative position of each important text block in the image is set to be the same;in view of the fact that the image and the text are relatively semantically independent,so that the text in the image has no attentional influence on the original question text.The experimental results show that the knowledge filtering step can effectively improve the incorporated knowledge accuracy,and its performance is improved for both K-BERT and FK-BERT.FK-BERT considers more relationships between entities and incorporates more complete knowledge compared to the K-BERT model,and achieves better performance in question matching.Meanwhile,the experimental results show that using domain image information enhancement to optimize the question pair matching is effective,and the FK-BERT model,which combines knowledge filtering and image information,achieves better accuracy rate mention,recall rate and F1 score.In addition,this paper implements a prototype system of FAQ for the OS domain,which applies the above question matching model.After testing and using,the system can better meet the practical needs in terms of speed and accuracy indexes.
Keywords/Search Tags:Pre-training model, Knowledge graph, Semantic similarity, QA system
PDF Full Text Request
Related items