Font Size: a A A

The Construction Of Knowledge Graph In Education Field Based On Natural Language Processing

Posted on:2021-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:M X SongFull Text:PDF
GTID:2518306524469574Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet era,how to efficiently and quickly obtain key knowledge from massive texts is a problem that needs to be solved urgently.As a semantic network,the knowledge graph covers various fields,it can present massive amounts of information to users in the form of entity relation triples,so that users can quickly and accurately obtain the required information.However,the construction of field-oriented knowledge graphs is still in its infancy,and relevant research is relatively scarce.Therefore,this article is mainly for research in the field of education,on this basis,the entity relation extraction algorithm is studied,and finally the knowledge graph is constructed.The specific work is as follows:(1)Extended field dictionary: Mainly select Sogou corpus and THUCNews data set related to education corpus for preprocessing,and then expand the dictionary work: First,use the combination of Text Rank and word2 vec to expand the educational dictionary;Then define a set of part-of-speech tagging rules according to the characteristics of the field vocabulary,and use the method based on compound words to expand.Finally,combine the two methods to get a custom dictionary.(2)Named entity recognition: This paper uses the Bi LSTM-CRF model for named entity recognition tasks,compared with LSTM,bidirectional LSTM can learn context information more effectively.Before the experiment,in order to fit the field,the built custom dictionary was added to the annotation data set using the BIO annotation method.(3)Relation extraction: Since there are no more mature labeled data sets in the education field for reference in the current research.First,a field annotation data set is constructed manually.Select part of the educational corpus for labeling,including five types of relation labels: "study at","employment in","co-worker","teacher and student" and "others".Then combined with the entity co-occurrence network on the basis of deep learning,an improved relation extraction model(Cooc?Att?Bi LSTM)is proposed.The model is divided into three parts: First,a relation extraction method based on Bi LSTM attention mechanism is proposed,which fully considers part-of-speech tagging,dependency parsing,semantic and relative position features to extract sentence-level semantic features;then,in order to fully consider the full text information,combine the entity co-occurrence network to extract corpus-level global features,and use the adjacent node information of each entity in the co-occurrence network to represent context information;finally,the entity relation classification is performed.(4)Build a knowledge graph: The entity pairs and relations are ruled into entity relation triples,and the Neo4 j graph database is used to complete visualization and simple query functions.
Keywords/Search Tags:Entity Relation Extraction, Knowledge Graph, Bidirectional Long-Short Term Memory, Entity Co-occurrence Network, Neo4j
PDF Full Text Request
Related items