Font Size: a A A

Knowledge Graph Construction In Energy Battery Field Based On Natural Language Processing

Posted on:2019-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:N XiaoFull Text:PDF
GTID:2382330572469196Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,information has soared,and the rapid capture of valuable knowledge from massive texts is an urgent problem to be solved,and the knowledge Graph is here.Although in many other fields,many knowledge map libraries have been successfully built,the construction of knowledge Graph for professional fields is still in its infancy,especially in the field of energy batteries,which seriously hinders the application and sharing of knowledge in this field.Therefore,this thesis constructs knowledge Graph with 50 scientific papers in this field as the research object.The research mainly focuses on four aspects,namely,part-of-speech tagging,entity extraction,relationship extraction and knowledge Graph construction.Part of speech tagging.First of all,according to the structural characteristics of the paper and Chinese entity and English entity,the entities with special characteristics in the text are extracted,and the custom dictionary is added by deduplication and sorting.Then,through HanLP's phrase extraction function,all possible phrases are extracted and matched by the source text to retain the phrase with the matching degree of 2 or more,and the final remaining entities are added to the custom dictionary through the screening of the entity rules.Finally,the updated custom dictionary is used to compare the HanLP with the manually labeled results.The general rules of the field are extracted,and the entities extracted by the rules are manually filtered and added to the custom dictionary to complete the part-of-speech tagging based on the custom dictionary.Entity extraction.This paper mainly introduces graph model into entity extraction,and proposes an extraction method based on improved TextRank algorithm.The joint characteristic values based on the node length and the node information amount and the edge weights based on the sliding window and the mutual information are respectively weighted together in the traditional TextRank algorithm to create a new scoring function that determines the final entity extraction results by iterating and thresholds.Entity relationship.This paper divides the research into classification and non-categorical relationships.Firstly,according to the "is a" model,the "is one" model is proposed,and the hyponymic relation of the text is directly extracted,that is,the classification relationship.Secondly,all entity pairs are extracted based on the co-occurrence relationship.On the basis of all the common reality pairs,the word2 vec model is used to calculate the semantic relationship between entities.Thirdly,the dependency syntax analysis and the rules are used to obtain the parallel relationship,the subject-object relationship,the synonym relationship and the aggregation relationship between the entities.Finally,quantitative all relationship,descending ordering and threshold analysis to obtain the final entity relationship.Building knowledge Graphs.Based on the extracted entities and the relationships between the entities,this paper uses Pajek software to visualize the knowledge Graph.Due to the continuous development of scientific research,the output of scientific papers is increasing day by day.Therefore,constructing knowledge Graphs is not a final work.With the continuous increase of professional information,the structure of knowledge Graphs is constantly updated,and a more complete knowledge system is established.The core tasks include: corpus construction,part-of-speech tagging,information calculation,word length statistics,TextRank,entity extraction,word2 vec,syntactic analysis,extraction of relational model,entity relation extraction,knowledge graph construction and other operations.
Keywords/Search Tags:Knowledge Graph, TextRank, Word2vec, Syntactic Analysis, Rule
PDF Full Text Request
Related items