Font Size: a A A

Research On Entity Recognition And Relation Extraction For Data Science Discipline Knowledge Graph Construction

Posted on:2024-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:J H WuFull Text:PDF
GTID:2568307112976819Subject:Electronic information
Abstract/Summary:PDF Full Text Request
The subject knowledge graph is the product of combining knowledge graph and subject education,which can clearly show the knowledge structure of subjects and the correlation between subject knowledge.Subject term extraction,named entity recognition,and relation extraction are both important steps in constructing subject knowledge graphs and fundamental work for other natural language processing tasks.Data science is an emerging interdisciplinary discipline that combines mathematics,statistics,and computer science.To promote the development of the data science discipline and enrich the research materials of this discipline,you can learn data science discipline knowledge and knowledge lineage quickly by constructing a data science discipline knowledge graph.Based on the above background,this dissertation first researches the entity recognition and relation extraction tasks,and applies the research results to the data science discipline knowledge graph construction task as an example,The specific research work is as follows:(1)Entity recognition research,most existing entity recognition models rely on a large annotated corpus,but the knowledge corpus of emerging disciplines is relatively small and the cost of annotating the corpus is high.To solve the above problem,Bi LSTM-CRF is chosen as the base model,and a low-resource entity recognition model is constructed by incorporating multi-layer character information and combining it with a self-attentive mechanism.The experimental results show that the experimental results of the model with a 20% training set are comparable to the experimental results of the base model with a 70% training set,which can better adapt to the text entity recognition task.(2)Relation extraction research,the dissertation selects the BERT model as the base model and combines a convolutional neural network to form an enhanced text feature extractor,and inputs the extracted text features into the relation computation layer for the operation to obtain the hidden state representation of words and relationships,followed by a mask matrix to mask all non-entities in the text to obtain the probability values of relationships between entities.Finally,the experimental results of the model on two public data sets are compared with those of other models,and the model is found to have better performance on the relation extraction task.(3)Knowledge graph construction for data science disciplines.In this dissertation,firstly researches the process of constructing disciplinary knowledge graphs and analyze the advantages and disadvantages of disciplinary knowledge graphs;secondly use entity recognition technology and relation extraction technology to extract entity relationship triads from the manually collected data science disciplinary knowledge corpus;finally use the graph query language Cypher to deposit the collated triads into the Neo4 j graph database to complete the storage and visualization of the knowledge graph of the discipline.
Keywords/Search Tags:discipline knowledge graph, data science, entity recognition, relation extraction
PDF Full Text Request
Related items