Under the new circumstances of promoting the construction of "Double First-Class" initiative,discipline construction increasingly emphasizes connotative development.This trend requires universities in the initiative to fully grasp the basic data of disciplines and clarify their own conditions in order to test,diagnose,and monitor the effectiveness of discipline construction,and promote the connotative development of disciplines in a planned and focused manner.Traditional methods of obtaining discipline data,such as using search engines to input keywords and return web page links containing keywords,require a lot of time and effort to find effective information from the links,which constitutes an obstacle for discipline builders to obtain discipline data accurately,clearly,and efficiently,and has become an urgent problem faced by the informatization work of university discipline construction.In response to this,this paper takes the discipline data knowledge graph as the research object,and explores the knowledge extraction and intelligent question answering of discipline data in response to the problems of inaccurate entity recognition and relationship extraction of a large amount of unstructured discipline data obtained from the web in the process of constructing the knowledge graph,as well as the weak generalization ability of traditional query and retrieval methods of discipline data.The main research content includes the following three points:(1)Addressing the issue of difficulty in extracting and utilizing the vast amount of unstructured academic data on the internet,this paper proposes the use of Named Entity Recognition(NER)technology to identify specific categories of entities in the text,and the adoption of Piecewise Convolutional Neural Network(PCNN)technology for the extraction of relationships between entities.Specifically,NER named entity recognition technology uses the BERT pre-training model to learn language knowledge,Bi LSTM to capture dependencies,and CRF to understand tag sequences,to achieve efficient entity recognition.On the other hand,relationship extraction based on PCNN reduces interference and extracts features through sentence segmentation and position embedding encoding.The final experiment proved that the aforementioned algorithm has a relatively high level of accuracy.(2)In response to the problem of the weak generalization ability of traditional query and retrieval methods for discipline data,this paper proposes to use a short text classification model based on Ro BERTa-wwm-ext to achieve high-quality short text semantic vector representation,and perform multi-classification tasks.According to the classification results,semantic analysis is performed,then the identified entities and analysis results are used in combination with the Cypher query language of the knowledge graph to implement the intelligent question answering module of the knowledge graph,thereby improving the effectiveness and quality of discipline data retrieval.Finally,the experiment verifies that the intelligent question answering method proposed in this chapter has good performance.(3)Based on the algorithms proposed in Chapters 3 and 4,combined with the knowledge graph construction technology introduced in Chapter 2,this paper designs and implements a web-based prototype system of university discipline data knowledge graph.This system not only displays the relevant attributes and relationships of discipline data,but also receives and answers short text questions from users,achieving the visualization of the discipline data knowledge graph.The system provides a variety of functions including information query and relationship display,providing strong informational data support for university discipline construction. |